Confused by the different ways of extracting data in DataFrame

leea2024 · Aug-17-2024, 08:20 AM

Suppose we have the following DataFrame recording the results of 5 students.

dic0 = {'Java':[87,65,26,89,67],
        'C++':[63,98,66,89,80],
        'Python':[78,25,76,43,69]}
d = pd.DataFrame(dic0)
d.index =  ['Tom', 'Bob', 'Tim', 'Wien', 'Lily']

d[1:] should give the rows from index 1 to the end. The number inside square brackets is referring to the rows.

Output:	Java	C++	Python
Bob	65	98	25
Tim	26	66	76
Wien	89	89	43
Lily	67	80	69

But if I write d['Java'], the system will consider the thing inside sqaure brackets as column index. The output is:

Output:Tom     87
Bob     65
Tim     26
Wien    89
Lily    67
Name: Java, dtype: int64

I am confused here. In both examples, I just insert one 'item' in the square brackets, but the system regards the first one as row index while the second one column index. What is the rule behind this? How can I know in what ways the system will interpret my input?

I then try to get Tom's Java result. I thought I should give the row index first and write d['Tom']['Java'] but it turns out that I should write d['Java']['Tom']. In other words, I should extract the 'Java' columns, then look for 'Tom'.

Output:
87

My next job is to get Tim's result of all three subjects, and I try to use d.loc[]. This time, I need to give the the row index first and the colum index. Hence, I should write d.loc['Tim',:].

Output:Java      26
C++       66
Python    76
Name: Tim, dtype: int64

But now I am confused. When we try to extract elements from DataFrame, sometimes we give the row index first while sometimes it's the opposite. Is that a general rule behind this? Or do I just have to memorize the different requirements of the ways of extracting elements?

**deanhystad** · (This post was last modified: Aug-17-2024, 01:34 PM by deanhystad.)

[1] is a single index. [1:] is a list of slice, a list of indices starting at 1 and going to the end. Slices are part of Python, not something specific to pandas.

Indexing is easier to understand if you do things in stages.

When I call d['Tom'] I get an error

Error:    indexer = self.columns.get_loc(key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^    
KeyError: 'Tom'

This tells me that "Tom" is not a column index in d. I can verify.

print(d.columns)

Output:
Index(['Java', 'C++', 'Python'], dtype='object')

However, if I call d["Java"] I should get a column, since "Java" appears in the list of columns.

Tom     87
Bob     65
Tim     26
Wien    89
Lily    67
Name: Java, dtype: int64

d["Java"] returns a series, a single column of d. I can index he series just like I indexed the dataframe. I can print a list of index keys like this.

print(d["Java"].index)

Output:
Index(['Tom', 'Bob', 'Tim', 'Wien', 'Lily'], dtype='object')

"Tom" appears in the list of index keys, so I can call d["Java"]["Tom"]

If you really wanted to call d["Tom"]["Java"] you could , with a slight change. You use loc or iloc to select a row from the dataframe.

print(d.loc["Tom"])
print(d.loc["Tom"]["Java"]

Output:Java      87
C++       63
Python    78
Name: Tom, dtype: int64
87

So dataframes are indexed by columns unless you use loc/iloc which lets you index by rows. This makes sense as most operations performed in pandas are on columns of data instead of rows of data. I find myself almost never using loc/iloc to get data for a row, primarily because when I perform an operation in pandas I usually want to apply the operation to all rows, and secondly because pandas is really fast at doing things with columns and relatively glacial when performing operations on individual rows.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	extracting data from a user-completed fillable pdf	Perry	2	1,579	Sep-25-2025, 01:49 PM Last Post: DeaD_EyE
	Extracting data from bank statement PDFs (Accountant)	a4avinash	4	18,002	Feb-27-2025, 01:53 PM Last Post: griffinhenry
	Extracting the correct data from a CSV file	S2G	6	3,009	Jun-03-2024, 04:50 PM Last Post: snippsat
	Different Ways to Import Modules	RockBlok	2	1,907	Dec-11-2023, 04:29 PM Last Post: deanhystad
	Filter data into new dataframe as main dataframe is being populated	cubangt	8	4,576	Oct-23-2023, 12:43 AM Last Post: cubangt
	String int confused	janeik	7	3,580	Aug-02-2023, 01:26 AM Last Post: deanhystad
	I am confused with the key and value thing	james1019	3	2,536	Feb-22-2023, 10:43 PM Last Post: deanhystad
	Extracting Data into Columns using pdfplumber	arvin	17	39,781	Dec-17-2022, 11:59 AM Last Post: arvin
	Seeing al the data in a dataframe or numpy.array	Led_Zeppelin	1	2,187	Jul-11-2022, 08:54 PM Last Post: Larz60+
	Need help formatting dataframe data before saving to CSV	cubangt	16	13,922	Jul-01-2022, 12:54 PM Last Post: cubangt

Confused by the different ways of extracting data in DataFrame

User Panel Messages

Announcements