May-30-2024, 04:26 PM
Hi,
I have a dataset which consists of relationships between old product and new product. I would like to have them grouped into relationship groups. I have written a script with some sample data. The expected result are after each row with a number. This column is not a part of the original data, there you will only find two columns Old product and New product.
As you can see I don't get the expected result on this row: "9000825_ENDOS025", "9000825_NEXO025", 3),
This relationsship between these two products are put in group 5 (see under result), but I want it grouped in group 3, because you can find this product key 9000825_NEXO025 on both left and right side.
I tried to sort the dataset, which gave me the right result, but I don't think I can rely on sorting the dataset which consists of 134.000 rows. How to change the code to get the desired result?
Best regards
Morten
I have a dataset which consists of relationships between old product and new product. I would like to have them grouped into relationship groups. I have written a script with some sample data. The expected result are after each row with a number. This column is not a part of the original data, there you will only find two columns Old product and New product.
As you can see I don't get the expected result on this row: "9000825_ENDOS025", "9000825_NEXO025", 3),
This relationsship between these two products are put in group 5 (see under result), but I want it grouped in group 3, because you can find this product key 9000825_NEXO025 on both left and right side.
I tried to sort the dataset, which gave me the right result, but I don't think I can rely on sorting the dataset which consists of 134.000 rows. How to change the code to get the desired result?
Best regards
Morten
def group_related_products(data):
groups = []
for row in data:
old_product, new_product, group = row
found_group = False
for existing_group in groups:
if old_product in existing_group:
existing_group.add(new_product)
found_group = True
break
if not found_group:
groups.append({old_product, new_product})
return groups
# Sample data
data = [
("9000002_88008621", "9000002_88008621", 1),
("9000002_88008621", "9000463_2526534", 1),
("9000002_88008625", "9000002_88008625", 2),
("9000002_88008625", "9000463_160159", 2),
("9000825_NEXO025", "9000756_13002", 3),
("9000756_13002", "9000756_13004", 3),
("9000756_42420", "9000756_42431", 4),
("9000002_88008621", "9006274_88008621", 1),
("9000825_ENDOS025", "9000825_NEXO025", 3),
("9032273_006899", "9000048_1000010123", 6),
("9032273_006899", "9000035_KZDC120003", 6),
("9032273_006899", "9000028_IV-9001B", 6),
("9032272_BH-EGF", "9000048_1000010123", 7),
("9032272_BH-EGF", "9000035_KZDC120003", 7),
("9032272_BH-EGF", "9000028_IV-9001B", 7),
]
# Group related products
related_groups = group_related_products(data)
# Print the groups
for i, group in enumerate(related_groups, 1):
print(f"Group {i}: {group}")Output:Group 1: {'9000463_2526534', '9006274_88008621', '9000002_88008621'}
Group 2: {'9000463_160159', '9000002_88008625'}
Group 3: {'9000756_13002', '9000756_13004', '9000825_NEXO025'}
Group 4: {'9000756_42431', '9000756_42420'}
Group 5: {'9000825_ENDOS025', '9000825_NEXO025'}
Group 6: {'9000048_1000010123', '9000035_KZDC120003', '9000028_IV-9001B', '9032273_006899'}
Group 7: {'9000048_1000010123', '9032272_BH-EGF', '9000035_KZDC120003', '9000028_IV-9001B'}
