How to group related products in relationship groups?

RegionHUser · May-30-2024, 04:26 PM

Hi,

I have a dataset which consists of relationships between old product and new product. I would like to have them grouped into relationship groups. I have written a script with some sample data. The expected result are after each row with a number. This column is not a part of the original data, there you will only find two columns Old product and New product.

As you can see I don't get the expected result on this row: "9000825_ENDOS025", "9000825_NEXO025", 3),

This relationsship between these two products are put in group 5 (see under result), but I want it grouped in group 3, because you can find this product key 9000825_NEXO025 on both left and right side.

I tried to sort the dataset, which gave me the right result, but I don't think I can rely on sorting the dataset which consists of 134.000 rows. How to change the code to get the desired result?

Best regards

Morten

def group_related_products(data):
    groups = []
    for row in data:
        old_product, new_product, group = row
        found_group = False
        for existing_group in groups:
            if old_product in existing_group:
                existing_group.add(new_product)
                found_group = True
                break
        if not found_group:
            groups.append({old_product, new_product})
    return groups

# Sample data
data = [
    ("9000002_88008621", "9000002_88008621", 1),
    ("9000002_88008621", "9000463_2526534", 1),
    ("9000002_88008625", "9000002_88008625", 2),
    ("9000002_88008625", "9000463_160159", 2),
    ("9000825_NEXO025", "9000756_13002", 3),
    ("9000756_13002", "9000756_13004", 3),
    ("9000756_42420", "9000756_42431", 4),
    ("9000002_88008621", "9006274_88008621", 1),
    ("9000825_ENDOS025", "9000825_NEXO025", 3),
    ("9032273_006899", "9000048_1000010123", 6),
    ("9032273_006899", "9000035_KZDC120003", 6),
    ("9032273_006899", "9000028_IV-9001B", 6),
    ("9032272_BH-EGF", "9000048_1000010123", 7),
    ("9032272_BH-EGF", "9000035_KZDC120003", 7),
    ("9032272_BH-EGF", "9000028_IV-9001B", 7),
]

# Group related products
related_groups = group_related_products(data)

# Print the groups
for i, group in enumerate(related_groups, 1):
    print(f"Group {i}: {group}")

Output:Group 1: {'9000463_2526534', '9006274_88008621', '9000002_88008621'}
Group 2: {'9000463_160159', '9000002_88008625'}
Group 3: {'9000756_13002', '9000756_13004', '9000825_NEXO025'}
Group 4: {'9000756_42431', '9000756_42420'}
Group 5: {'9000825_ENDOS025', '9000825_NEXO025'}
Group 6: {'9000048_1000010123', '9000035_KZDC120003', '9000028_IV-9001B', '9032273_006899'}
Group 7: {'9000048_1000010123', '9032272_BH-EGF', '9000035_KZDC120003', '9000028_IV-9001B'}

**Gribouillis** · (This post was last modified: May-30-2024, 07:11 PM by Gribouillis.)

It seems that you are looking for the connected components of an undirected graph. You could use specialized modules such as networkx for this

data = [
    ("9000002_88008621", "9000002_88008621", 1),
    ("9000002_88008621", "9000463_2526534", 1),
    ("9000002_88008625", "9000002_88008625", 2),
    ("9000002_88008625", "9000463_160159", 2),
    ("9000825_NEXO025", "9000756_13002", 3),
    ("9000756_13002", "9000756_13004", 3),
    ("9000756_42420", "9000756_42431", 4),
    ("9000002_88008621", "9006274_88008621", 1),
    ("9000825_ENDOS025", "9000825_NEXO025", 3),
    ("9032273_006899", "9000048_1000010123", 6),
    ("9032273_006899", "9000035_KZDC120003", 6),
    ("9032273_006899", "9000028_IV-9001B", 6),
    ("9032272_BH-EGF", "9000048_1000010123", 7),
    ("9032272_BH-EGF", "9000035_KZDC120003", 7),
    ("9032272_BH-EGF", "9000028_IV-9001B", 7),
]

import networkx as nx
G = nx.Graph()
for a, b, _ in data:
    G.add_edge(a, b)
for c in nx.connected_components(G):
    print(c)

Output:{'9006274_88008621', '9000463_2526534', '9000002_88008621'}
{'9000463_160159', '9000002_88008625'}
{'9000756_13002', '9000756_13004', '9000825_NEXO025', '9000825_ENDOS025'}
{'9000756_42431', '9000756_42420'}
{'9000028_IV-9001B', '9032272_BH-EGF', '9000035_KZDC120003', '9000048_1000010123', '9032273_006899'}

You can also use other implementations, for example it seems that there are two implementations in Rosetta Code: the Tarjan algorithm (although Tarjan's algorithm is for directed graphs, you may need to add the reversed edges to your graph).

Pedroski55 · (This post was last modified: Jun-02-2024, 03:51 PM by Pedroski55.)

Not too sure what this is all about.

134.000 rows, no problem! Send them over!

# Sample data
data = [
    ("9000002_88008621", "9000002_88008621", 1),
    ("9000002_88008621", "9000463_2526534", 1),
    ("9000002_88008625", "9000002_88008625", 2),
    ("9000002_88008625", "9000463_160159", 2),
    ("9000825_NEXO025", "9000756_13002", 3),
    ("9000756_13002", "9000756_13004", 3),
    ("9000756_42420", "9000756_42431", 4),
    ("9000002_88008621", "9006274_88008621", 1),
    ("9000825_ENDOS025", "9000825_NEXO025", 3),
    ("9032273_006899", "9000048_1000010123", 6),
    ("9032273_006899", "9000035_KZDC120003", 6),
    ("9032273_006899", "9000028_IV-9001B", 6),
    ("9032272_BH-EGF", "9000048_1000010123", 7),
    ("9032272_BH-EGF", "9000035_KZDC120003", 7),
    ("9032272_BH-EGF", "9000028_IV-9001B", 7),
]

data_dict = {row[0]:[] for row in data}
len(data) # returns 15
len(data_dict) # returns 8: some old products are related to more than 1 new product

for row in data:
    tup = (row[1], row[2])
    data_dict[row[0]].append(tup)

for item in data_dict.items():
    print(item)

count = 1
for key in data_dict.keys():
    print(f"Group {count}: old product = {key}, related new products = {data_dict[key]}")
    count +=1

Gives:

Output:Group 1: old product = 9000002_88008621, related new products = [('9000002_88008621', 1), ('9000463_2526534', 1), ('9006274_88008621', 1)]
Group 2: old product = 9000002_88008625, related new products = [('9000002_88008625', 2), ('9000463_160159', 2)]
Group 3: old product = 9000825_NEXO025, related new products = [('9000756_13002', 3)]
Group 4: old product = 9000756_13002, related new products = [('9000756_13004', 3)]
Group 5: old product = 9000756_42420, related new products = [('9000756_42431', 4)]
Group 6: old product = 9000825_ENDOS025, related new products = [('9000825_NEXO025', 3)]
Group 7: old product = 9032273_006899, related new products = [('9000048_1000010123', 6), ('9000035_KZDC120003', 6), ('9000028_IV-9001B', 6)]
Group 8: old product = 9032272_BH-EGF, related new products = [('9000048_1000010123', 7), ('9000035_KZDC120003', 7), ('9000028_IV-9001B', 7)]

What exactly you wish to do with the values of data_dict, I am not clear on that.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Random student selection from groups.	esahan	7	2,943	Jul-08-2024, 12:28 AM Last Post: AdamHensley
	why does VS want to install seemingly unrelated products?	db042190	3	1,896	Jun-12-2023, 02:47 PM Last Post: deanhystad
	How do I run a program without any relationship to it?	Pymon	3	2,631	Apr-05-2022, 12:17 AM Last Post: Pymon
	Ldap Search for finding user Groups	ilknurg	1	3,483	Mar-11-2022, 12:10 PM Last Post: DeaD_EyE
	Make Groups with the List Elements	quest	2	3,319	Jul-11-2021, 09:58 AM Last Post: perfringo
	Understanding Regex Groups	matt_the_hall	5	5,109	Jan-11-2021, 02:55 PM Last Post: matt_the_hall
	Although this is a talib related Q it's mostly related to python module installing..	Evalias123	4	11,195	Jan-10-2021, 11:39 PM Last Post: Evalias123
	How to solve equations, with groups of variables and or constraints?	ThemePark	0	2,650	Oct-05-2020, 07:22 PM Last Post: ThemePark
	Create homogeneous groups with Kmeans ?	preliator	0	2,472	Sep-01-2020, 02:29 PM Last Post: preliator
	Generate Cartesian Products with Itertools Incrementally	CoderMan	2	3,443	Jun-04-2020, 04:51 PM Last Post: CoderMan

How to group related products in relationship groups?

User Panel Messages

Announcements