updating cluster of elements based on the max value of distance

alex80 · Oct-02-2020, 11:11 AM

I working on a machine learning program to cluster a list of elements, then I try to calculate distance euclidean of each element with each cluster, then changing the cluster of elements based on the max value of distance (ai). here is my code

import pandas as pd
import numpy as np
from sklearn_extra.cluster import KMedoids
from sklearn.metrics.pairwise import euclidean_distances
from scipy.spatial import distance

X = np.array([0.85,0.92,0.71])
X2=X.reshape(-1, 1)

model1 = KMedoids(n_clusters=3, random_state=0).fit(X2)
cluster_labels = model1.predict(X2)
clusters, counts = np.unique(cluster_labels[cluster_labels>=0], return_counts=True)

df1 = pd.DataFrame(zip(cluster_labels,X2))
df1.index = df1.index
df1 = df1.rename({0: 'cluster', 1: 'score'}, axis=1)
df1['element'] = df1.index
df1 = df1[['cluster', 'element',  'score']]

df2 = pd.DataFrame({'cluster label': model1.labels_[model1.medoid_indices_],
                              'cluster centroid': model1.medoid_indices_,
                              'cluster size': counts})    

df3 = pd.DataFrame(zip(model1.labels_[model1.medoid_indices_],
                            np.squeeze(X2[model1.medoid_indices_]),
                            model1.medoid_indices_))
    
df3.index = df3.index
df3 = df3.rename({0: 'cluster label', 1: 'cluster centroid', 2: 'element'}, axis=1)
df3 = df3[['cluster label',  'cluster centroid', 'element']]

print ('Tabel 1:')
print(df1)
print()  
print ('Tabel 2:')
print(df2)
print()
print ('Tabel 3:')
print(df3)
print()

for i in range(len(df2)):
        c = list()
        print("element", df2.index[i])
        for j in range(len(df2)):
            element = (X2[df2.index[i]])
            a2=(df3['cluster centroid'][j])
            ai = distance.euclidean(element, a2)
            print('a1',element)
            print('a2',a2)
            c.append(ai)
            print('cluster',j, ':', ai)
            print("----")
        max_value = max(c)
        print("max value (cluster):", max_value)
        print("*****")

for example, when I calculated (ai) for all elements. the results are

Output:Tabel 1:
   cluster  element   score
0        0        0  [0.85]
1        1        1  [0.92]
2        2        2  [0.71]

Tabel 2:
   cluster label  cluster centroid  cluster size
0              0                 0             1
1              1                 1             1
2              2                 2             1

Tabel 3:
   cluster label  cluster centroid  element
0              0              0.85        0
1              1              0.92        1
2              2              0.71        2

element 0
a1 [0.85]
a2 0.85
cluster 0 : 0.0
----
a1 [0.85]
a2 0.92
cluster 1 : 0.07000000000000006
----
a1 [0.85]
a2 0.71
cluster 2 : 0.14
----
max value (cluster): 0.14
*****
element 1
a1 [0.92]
a2 0.85
cluster 0 : 0.07000000000000006
----
a1 [0.92]
a2 0.92
cluster 1 : 0.0
----
a1 [0.92]
a2 0.71
cluster 2 : 0.21000000000000008
----
max value (cluster): 0.21000000000000008
*****
element 2
a1 [0.71]
a2 0.85
cluster 0 : 0.14
----
a1 [0.71]
a2 0.92
cluster 1 : 0.21000000000000008
----
a1 [0.71]
a2 0.71
cluster 2 : 0.0
----
max value (cluster): 0.21000000000000008
*****

as we can see, that max value (cluster) of the element (0) (0.85) is cluster 2 : 0.070, and element (1) is cluster 2 : 0.210 and for element (2) is cluster 1 : 0.21.

How could I update table 1 and 2, based on the cluster of the max value of (cluster) of each element?

So Expected output:

Output:     cluster  element   score
0        2        0  [0.85]
1        2        1  [0.92]
2        1        2  [0.71]

Tabel 2:
   cluster label  cluster centroid  cluster size
0              0                 0             0
1              1                 1             1
2              2                 2             2

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	M4 mini cluster to crunch & munch large datasets	GoddessGermanika	2	3,806	Aug-10-2025, 04:19 AM Last Post: PyRobo
	using silhouette score for each sample of an array with each cluster	alex80	1	4,506	Sep-25-2020, 11:35 PM Last Post: scidam
	Computing the distance between each pair of points	Truman	11	7,782	Jun-20-2020, 01:15 PM Last Post: Truman
	Formula with elements of list - If-condition regarding the lists elements	lewielewis	2	3,999	May-08-2020, 01:41 PM Last Post: nnk
	How to cluster dataset	neha_garg	0	2,851	Nov-14-2019, 07:38 AM Last Post: neha_garg
	Could anyone help me get the jaccard distance between my dataframes please? :)	a_real_phoenix	0	2,702	Jun-27-2019, 06:01 PM Last Post: a_real_phoenix
	Distance between 2 user defined geo-grids in km	chandrakant98c	0	2,766	Jun-16-2019, 11:26 AM Last Post: chandrakant98c
	Clustering based on a variable and on a distance matrix	flucoe	2	8,250	Dec-16-2018, 09:57 PM Last Post: flucoe
	Checking the elements of a matrix with an elements of a list	juniorcoder	11	9,404	Sep-17-2018, 03:02 PM Last Post: gruntfutuk
	Updating df rows based on 2 conditions	stretch	1	4,261	May-02-2018, 09:15 AM Last Post: volcano63

updating cluster of elements based on the max value of distance

User Panel Messages

Announcements