Sorting a large CVS file

DavidTheGrockle · Oct-31-2019, 12:15 PM

What I have is a large 707,000 row CVS file. In col 9 of 10 there is an id which looks like EG47213, EB53955 and so on. There are probably about 700 different such ids. Column 9 is called 'individual-local-identifier'. I need to separate out all the rows with a given id. By looking at the first few rows "manually" so to speak I found that EG47213 ran from row 3844 for another 4127 rows.

I then tried

import pandas as pd
print("Hello, World!")
# Read the data into a variable pacto
url = "F:/Carrier Bag F/NAV Padget/Padget.csv"
pacto = pd.read_csv(url)
frutj = pacto[pacto['individual-local-identifier'] == "EG47213"]
kb = frutj.shape
print(kb)

To my surprise this said that there were 8627 such rows, not 4127. When I started searching by hand I found the missing rows in two separate locations. (This took me a long time and made me mad)

There must be a better way of going on. I had hoped to find in the pandas documentation something like a FOR instruction and then some way of writing IF ... THEN escape or something similar.

Maybe I missed something.

***ichabod801*** · Oct-31-2019, 12:32 PM

(Oct-31-2019, 12:15 PM)DavidTheGrockle Wrote: I need to separate out all the rows with a given id.

What do you need to do? Do you need to separate out the rows with a given ID, or do you need to sort by the ID? You appear to already have the rows for a particular ID. To sort by that ID you would use sort_index:

pacto.sort_index(by = 'individual-local-identifier')

You may also want to look at the groupby method.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Data Sorting and filtering(From an Excel File)	PY_ALM	0	2,095	Jan-09-2023, 08:14 PM Last Post: PY_ALM
	Reading large crapy text file in anaconda to profile data	syamatunuguntla	0	2,184	Nov-18-2022, 06:15 PM Last Post: syamatunuguntla
	Chunking and Sorting a large file	Robotguy	1	5,286	Jul-29-2020, 12:48 AM Last Post: Larz60+
	extracting sublist from a large multiple molecular file	juliocollm	2	3,717	May-25-2020, 12:49 PM Last Post: juliocollm
	How to filter specific rows from large data file	Ariane	7	10,842	Jun-29-2018, 02:43 PM Last Post: gontajones
	access a very large file? As an array or as a dataframe?	Angelika	5	7,100	May-18-2017, 08:15 AM Last Post: Angelika

Sorting a large CVS file

User Panel Messages

Announcements