Dec-30-2020, 03:53 AM
I loaded a CSV file called 'IMDb movies.csv' from kaggle. Here is the link to this file https://www.kaggle.com/stefanoleone992/i...ve-dataset. I am just trying to reinforce what I learned about DataFrames and some basic commands. Down below I pasted my code. The problem is that I am getting a TypeError "TypeError: '>=' not supported between instances of 'str' and 'int'". I do not know why. It is coming from this line
year_before_1970 = movies_df[movies_df['year'] >= 1970]. However, the line below, I do something similar like such
print(movies_df[movies_df['avg_vote'] >= 8.6].head(3))and it doesn't give me an error. Can someone please help me figure out what went wrong.
import pandas as pd
movies_df = pd.read_csv('IMDb movies.csv')
#prints the first 5 rows of Dataset
print('***********************************\nFirst 5 rows\n',movies_df.head(), '\n***********************************')
#selects the columns headers of the Dataset
col = movies_df.columns
print('Headers of Dataset\n', col, '\n***********************************')
#which year produced the most movies
most_movies_yearly = movies_df.groupby('year').imdb_title_id.count().reset_index()
print(most_movies_yearly)
# movies were produced before the year 1970
year_before_1970 = movies_df[movies_df['year'] >= 1970]
print(year_before_1970)
