Oct-02-2020, 11:11 AM
I working on a machine learning program to cluster a list of elements, then I try to calculate distance euclidean of each element with each cluster, then changing the cluster of elements based on the max value of distance (ai). here is my code
How could I update table 1 and 2, based on the cluster of the max value of (cluster) of each element?
So Expected output:
import pandas as pd
import numpy as np
from sklearn_extra.cluster import KMedoids
from sklearn.metrics.pairwise import euclidean_distances
from scipy.spatial import distance
X = np.array([0.85,0.92,0.71])
X2=X.reshape(-1, 1)
model1 = KMedoids(n_clusters=3, random_state=0).fit(X2)
cluster_labels = model1.predict(X2)
clusters, counts = np.unique(cluster_labels[cluster_labels>=0], return_counts=True)
df1 = pd.DataFrame(zip(cluster_labels,X2))
df1.index = df1.index
df1 = df1.rename({0: 'cluster', 1: 'score'}, axis=1)
df1['element'] = df1.index
df1 = df1[['cluster', 'element', 'score']]
df2 = pd.DataFrame({'cluster label': model1.labels_[model1.medoid_indices_],
'cluster centroid': model1.medoid_indices_,
'cluster size': counts})
df3 = pd.DataFrame(zip(model1.labels_[model1.medoid_indices_],
np.squeeze(X2[model1.medoid_indices_]),
model1.medoid_indices_))
df3.index = df3.index
df3 = df3.rename({0: 'cluster label', 1: 'cluster centroid', 2: 'element'}, axis=1)
df3 = df3[['cluster label', 'cluster centroid', 'element']]
print ('Tabel 1:')
print(df1)
print()
print ('Tabel 2:')
print(df2)
print()
print ('Tabel 3:')
print(df3)
print()
for i in range(len(df2)):
c = list()
print("element", df2.index[i])
for j in range(len(df2)):
element = (X2[df2.index[i]])
a2=(df3['cluster centroid'][j])
ai = distance.euclidean(element, a2)
print('a1',element)
print('a2',a2)
c.append(ai)
print('cluster',j, ':', ai)
print("----")
max_value = max(c)
print("max value (cluster):", max_value)
print("*****")for example, when I calculated (ai) for all elements. the results areOutput:Tabel 1:
cluster element score
0 0 0 [0.85]
1 1 1 [0.92]
2 2 2 [0.71]
Tabel 2:
cluster label cluster centroid cluster size
0 0 0 1
1 1 1 1
2 2 2 1
Tabel 3:
cluster label cluster centroid element
0 0 0.85 0
1 1 0.92 1
2 2 0.71 2
element 0
a1 [0.85]
a2 0.85
cluster 0 : 0.0
----
a1 [0.85]
a2 0.92
cluster 1 : 0.07000000000000006
----
a1 [0.85]
a2 0.71
cluster 2 : 0.14
----
max value (cluster): 0.14
*****
element 1
a1 [0.92]
a2 0.85
cluster 0 : 0.07000000000000006
----
a1 [0.92]
a2 0.92
cluster 1 : 0.0
----
a1 [0.92]
a2 0.71
cluster 2 : 0.21000000000000008
----
max value (cluster): 0.21000000000000008
*****
element 2
a1 [0.71]
a2 0.85
cluster 0 : 0.14
----
a1 [0.71]
a2 0.92
cluster 1 : 0.21000000000000008
----
a1 [0.71]
a2 0.71
cluster 2 : 0.0
----
max value (cluster): 0.21000000000000008
***** as we can see, that max value (cluster) of the element (0) (0.85) is cluster 2 : 0.070, and element (1) is cluster 2 : 0.210 and for element (2) is cluster 1 : 0.21.How could I update table 1 and 2, based on the cluster of the max value of (cluster) of each element?
So Expected output:
Output: cluster element score
0 2 0 [0.85]
1 2 1 [0.92]
2 1 2 [0.71]
Tabel 2:
cluster label cluster centroid cluster size
0 0 0 0
1 1 1 1
2 2 2 2
