Combine Duplicate CVS Column Values

Jay · (This post was last modified: Nov-02-2016, 11:07 PM by Ofnuts.)

Hi All,
This is my first cry for help Big Grin

I'm a Python Noob and I have been progressing fairly well (I started learning Python about a month ago). I have come to the point in my code that I simply cannot figure out. I am trying to take a simple .csv file and tally duplicates and combine values. I have tried dictionaries, tuples, sets, etc etc but I am simply missing something.

My production .csv is about 150k lines long and is a log for commodities market state change messages... so I created a basic example of what I am dealing with. I have pretty good luck with regex and could probably solve this problem via regex.. but I know that is slow.

Here is my example .csv... lets say I'm a car dealer that owns several car lots.. so I have duplicates in the list.

ford, 3
ford, 3
honda, 1
dodge, 8
dodge, 2
dodge, 5
ford, 6
dodge, 2
chevy, 1
dodge, 5

My goal is to have a combined list that would look like this... the order in the result does not matter.. I just need totals

chevy, 1
dodge, 22
ford, 9
honda, 1

So far I have coded the following that parses the .csv and creates a list of car makes, and another list that has the car make and quantity.

import csv

log = ('C:\\Users\\' + 'jcastleb' + '\\Desktop\\' + 'cars.csv')


f = open(log)
csv_f = csv.reader(f)

#Create empty lists
carlist = []
car_count = []
result = []

for row in csv_f:
    #Create list of Car Makes and Numbers
    car_count.append(row)

    #Create the list of cars
    if row[0] not in carlist:
        carlist.append(row[0])

    if row[0] not in result:
        result.append(row[0])

print(carlist)
print(car_count)

I get the following when I run the script:

Output:['ford', 'honda', 'dodge', 'chevy']
[['ford', ' 3'], ['ford', ' 3'], ['honda', ' 1'], ['dodge', ' 8'], ['dodge', ' 2'], ['dodge', ' 5'], ['ford', ' 6'], ['dodge', ' 2'], ['chevy', ' 1'], ['dodge', ' 5']]

If anyone can advise on how I continue the code to get the total I would be greatly appreciative! If there is a more efficient method then mine please do let me know. Big Grin

I have a hunch that a dictionary may be the way to go but I could not figure out how to append the key values Sad

Thanks all!

***Ofnuts*** · Nov-02-2016, 11:21 PM

>>> cars={'ford':3,'chevy':2}
>>> 'ford' in cars
True
>>> 'dodge' in cars
False
>>> cars['ford']+=3
>>> cars['dodge']=5
>>> cars
{'chevy': 2, 'dodge': 5, 'ford': 6}

You can make it even simpler using a defaultdict. A defaultdict creates the item for you with a default value
when you reference it:

>>>import collections
# The dict will set a default value by calling a function
# so we give it an anonymous function that always returns 0 
>>> cars=collections.defaultdict(lambda:0)
# checking what happens when we reference something we haven't got yet
>>> cars['ford']
0
>>> cars['ford']+=3
# We can blindly add stuff, "dodge' isn't in the dict yet
>>> cars['dodge']+=8
>>> cars['dodge']+=3
>>> cars
defaultdict(<function <lambda> at 0x7f0131d4d050>, {'dodge': 11, 'ford': 3})

Jay · Nov-03-2016, 06:30 PM

Thank you Ofnuts! Your solution worked great. Big Grin

This is the final code using the method you suggested.

import collections
import csv
from timeit import default_timer as timer
start = timer()
#Set up log path and csv reader
log = ('C:\\Users\\' + 'jcastleb' + '\\Desktop\\' + 'endexlog.csv')
f = open(log)
csv_f = csv.reader(f)
#Create default dictionary 
counts = collections.defaultdict(lambda:0)
#Create empty lists
market_types = []
market_msg = []
strip_counts = []
#Function to update Market mnessage list
def market_message_parser():
        for row in csv_f:
            market_msg.append(row)
#Function to update dictionary with market names
def dictionary_names():
    for row in market_msg:
        if row[2] not in market_types:
            counts[(row[2])]
 
#Function to pull number of strips per market and update strip counts
def strip_parser():
    for row in market_msg:
        pull = (row[2], row[5])
        strip_counts.append(pull)
        
#Run List functions
market_message_parser()
dictionary_names()
strip_parser()
#update dictionary with counts
for row in strip_counts:
    name = row[0]
    number = int(row[1])
    counts[name] += number
print('Final Dictionary calculated:')
print()
print(counts)
print()
print()
print('Time to calculate')
end = timer()
print(end - start)

***snippsat*** · Nov-03-2016, 07:25 PM

Think it look better to set list in defaultdict.
The sum that list up.
Eg.

from collections import defaultdict

data = '''\
ford, 3
ford, 3
honda, 1
dodge, 8
dodge, 2
dodge, 5
ford, 6
dodge, 2
chevy, 1
dodge, 5'''

data = [i.split(', ') for i in data.split('\n')]
d = defaultdict(list)
[d[k].append(int(v)) for k, v in data]
print({k: sum(v) for (k, v) in d.items()})

Output:
{'dodge': 22, 'honda': 1, 'ford': 12, 'chevy': 1}

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Get an average of the unique values of a column with group by condition and assign it	klllmmm	0	2,941	Feb-17-2024, 05:53 PM Last Post: klllmmm
	Converting column of values into muliple columns of counts	highland44	0	1,300	Feb-01-2024, 12:48 AM Last Post: highland44
	PowerBI: Using Python Regex to look for values in column	MarcusR44	1	2,811	Oct-14-2022, 01:03 PM Last Post: ibreeden
	How to combine multiple column values into 1?	cubangt	15	9,696	Aug-11-2022, 08:25 PM Last Post: cubangt
	Average values on duplicate records	chandramouliarun	1	2,045	Jul-27-2022, 03:13 AM Last Post: deanhystad
	Creating a numpy array from specific values of a spreadsheet column	JulianZ	0	2,332	Apr-19-2022, 07:36 AM Last Post: JulianZ
	How to split file by same values from column from imported CSV file?	Paqqno	5	7,671	Mar-24-2022, 05:25 PM Last Post: Paqqno
	Float Slider - Affecting Values in Column 'Pandas'	planckepoch86	0	2,399	Jan-22-2022, 02:18 PM Last Post: planckepoch86
	Pandas DataFrame combine rows by column value, where Date Rows are NULL	rhat398	0	3,304	May-04-2021, 10:51 PM Last Post: rhat398
	How to get value in Dataframe given row & column values?	moonlight	1	3,771	Apr-26-2021, 09:30 PM Last Post: Larz60+

Combine Duplicate CVS Column Values

User Panel Messages

Announcements