Unable to understand a statement in an existing code

ateestructural · (This post was last modified: Aug-01-2020, 08:54 PM by ateestructural.)

I have the following code:

import nltk

nltk.download('stopwords')

import nltk.corpus
import re
import string

# turn a doc into clean tokens
from load_file_with_function import load_doc


def clean_doc(doc):
    # split the tokens by white space
    tokens = doc.split()
    # prepare regex for char filtering
    re_punc = re.compile('[%s]' % re.escape((string.punctuation)))
    # remove punctuation from each wor
    tokens = [re_punc.sub('', w) for w in tokens]
    # remove remaining tokens that are not alphabetic
    tokens = [word for word in tokens if word.isalpha()]
    # filter out stop-words
    stop_words = set(nltk.corpus.stopwords.words('english'))
    
    # filter out short tokens
    tokens = [word for word in tokens if len(word) > 1]
    print(tokens)

It is working because it is someone else's code - I have to work further on it

I'm unable to understand how this statement below is filtering out non alphabets from my set of words (tokens)

tokens = [word for word in tokens if word.isalpha()]

I know about the string function isalpha() but do not follow how the "new" tokens get rid of non alphabets in a single statement like this. Can anyone please explain?

**deanhystad** · (This post was last modified: Aug-01-2020, 09:44 PM by deanhystad.)

This is a list comprehension. It is a compact way of writing this:

temp = []
for word in tokens:
    if word.isalpha()
        temp.append(word)
tokens = temp

tokens = [] says the resulting list is assigned to "tokens".
[word for word in tokens] says the list is going to be made up of words from the original "tokens".
if isalpha(word) says only include words that are "isalpha".

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Unable to understand the function string.split()	Hudjefa	8	4,932	Sep-16-2024, 04:25 AM Last Post: Pedroski55
	I can't for the life of me get this basic If statement code to work	CandleType1a	8	3,364	May-21-2024, 03:58 PM Last Post: CandleType1a
	Python best library for Excel reports & review of existing code	MasterOfDestr	4	12,678	Feb-14-2024, 03:39 PM Last Post: MasterOfDestr
	Unable to understand the meaning of the line of code.	jahuja73	0	1,434	Jan-23-2024, 05:09 AM Last Post: jahuja73
	Python code: While loop with if statement	HAMOUDA	1	2,624	Sep-18-2023, 11:18 AM Last Post: deanhystad
	An unexplainable error in .format statement - but only in a larger piece of code?	ToniE	4	2,447	Sep-05-2023, 12:50 PM Last Post: ToniE
	code won't advance to next statement	MCL169	2	2,045	Apr-11-2023, 09:44 PM Last Post: Larz60+
	add mqtt feature to existing code	Positron79	0	1,347	Jan-31-2023, 05:56 PM Last Post: Positron79
	List Creation and Position of Continue Statement In Regular Expression Code	new_coder_231013	3	3,449	Jun-15-2022, 12:00 PM Last Post: new_coder_231013
	Visual studio code unable to color syntax on python interpreter	tomtom	4	16,404	Mar-02-2022, 01:23 AM Last Post: tomtom

Unable to understand a statement in an existing code

User Panel Messages

Announcements