Hi,
I am reading a csv and applying the def to remove unnecessary data.
If i apply for 174 rows,"dict = (dc_data['Description'].head(174).apply(process_text))" it gives below error.
If i specify 100 rows it works.
Requirements is to apply for all rows.
Any help is appreciated.
I am reading a csv and applying the def to remove unnecessary data.
If i apply for 174 rows,"dict = (dc_data['Description'].head(174).apply(process_text))" it gives below error.
If i specify 100 rows it works.
Requirements is to apply for all rows.
Any help is appreciated.
Error:Traceback (most recent call last):
File "C:\Python\test\DC\dc_mar2020.py", line 26, in <module>
dict = (ec_data['Description'].head(174).apply(process_text))
File "C:\Python\lib\site-packages\pandas\core\series.py", line 3848, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas\_libs\lib.pyx", line 2329, in pandas._libs.lib.map_infer
File "C:\Python\test\DC\dc_mar2020.py", line 16, in process_text
nopunc = [char for char in text if char not in string.punctuation]
TypeError: 'float' object is not iterableCode:-import pandas as pd
from textblob import TextBlob
import string
import nltk
from nltk.corpus import stopwords
dc_data = pd.read_csv('dc.csv', encoding="ISO-8859-1", index_col=False)
print(dc_data.head())
desc = dc_data['Description']
print(desc.shape)
def process_text(text):
#1
nopunc = [char for char in text if char not in string.punctuation]
nopunc = ''.join(nopunc)
#2
clean_words = [word for word in nopunc.split() if word.lower() not in stopwords.words('english')]
#3
return clean_words
#Show the Tokenization (a list of tokens )
dict = (dc_data['Description'].head(174).apply(process_text))
print("Dict: ", dict)
