New to Pandas. I need help fixing a TypeError

kramon19 · Dec-30-2020, 03:53 AM

I loaded a CSV file called 'IMDb movies.csv' from kaggle. Here is the link to this file https://www.kaggle.com/stefanoleone992/i...ve-dataset. I am just trying to reinforce what I learned about DataFrames and some basic commands. Down below I pasted my code. The problem is that I am getting a TypeError "TypeError: '>=' not supported between instances of 'str' and 'int'". I do not know why. It is coming from this line

year_before_1970 = movies_df[movies_df['year'] >= 1970]

. However, the line below, I do something similar like such

print(movies_df[movies_df['avg_vote'] >= 8.6].head(3))

and it doesn't give me an error. Can someone please help me figure out what went wrong.

import pandas as pd

movies_df = pd.read_csv('IMDb movies.csv')

#prints the first 5 rows of Dataset
print('***********************************\nFirst 5 rows\n',movies_df.head(), '\n***********************************')

#selects the columns headers of the Dataset
col = movies_df.columns
print('Headers of Dataset\n', col, '\n***********************************')

#which year produced the most movies
most_movies_yearly = movies_df.groupby('year').imdb_title_id.count().reset_index()
print(most_movies_yearly)

# movies were produced before the year 1970
year_before_1970 = movies_df[movies_df['year'] >= 1970]
print(year_before_1970)

***snippsat*** · Dec-30-2020, 02:20 PM

movies_df['year'] is now a string object.
Use movies_df.dtypes to check what types DataFrame has.
Then clean up so get a int64 for year.

Example:

import pandas as pd
from io import StringIO

data = StringIO('''\
Movie,Year
Seven,1995
The Godfather,1972 end
Jaws,1975
Lawrence of Arabia,1962''')

df = pd.read_csv(data, sep=',')
print(df)
print(df.dtypes)

Output:                Movie      Year
0               Seven      1995
1       The Godfather  1972 end
2                Jaws      1975
3  Lawrence of Arabia      1962
Movie    object
Year     object
dtype: object

Clean up and fix type.

df['Year'] = df['Year'].str.extract('(\d+)')
df['Year'] = pd.to_numeric(df["Year"])
print(df)
print(df.dtypes)

Output:                Movie  Year
0               Seven  1995
1       The Godfather  1972
2                Jaws  1975
3  Lawrence of Arabia  1962
Movie    object
Year      int64
dtype: object

Now can find movies before 1970.

print(df[df['Year'] <= 1970])

Output:                Movie  Year
3  Lawrence of Arabia  1962

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[pandas] TypeError: list indices must be integers or slices, not str but type is int.	cspower	4	4,381	Dec-30-2023, 09:38 AM Last Post: Gribouillis
	I need help fixing a syntax error!	chenqin348	5	6,315	Dec-27-2019, 12:07 PM Last Post: Larz60+

New to Pandas. I need help fixing a TypeError

User Panel Messages

Announcements