Hi,
Using the '|' character within a Regex is giving me an undesirable result that I have been unable to avoid. For example, consider a 2-page file with the following text in each page:
Page 1:
111A111 #red.
Page 2:
AAA1AAA #green.
If I do:
But if I do:
Thanks and apologies in advance if the question is not properly formulated. I'm a beginner.
Using the '|' character within a Regex is giving me an undesirable result that I have been unable to avoid. For example, consider a 2-page file with the following text in each page:
Page 1:
111A111 #red.
Page 2:
AAA1AAA #green.
for i in range(0,2):
text = doc.getPage(i).extract_text()
color_re = re.compile(r'#\w+\.')
color = color_re.findall(text)
print(color)Output:['red.']
['green.'] pattern_re = re.compile(r'(\w+\d+\w+)|(\d+\w+\d+)')
pattern = pattern_re.findall(text)
print(pattern)Output:('', 'AAA1AAA')
('111A111', '') If I do:
color =[item.strip('.') for item in color]I get rid of '.' so, all is good.But if I do:
pattern = [item.strip(' , ') for item in pattern]I get the error:Output:AttributeError: 'tuple' object has no attribute 'strip'Is there a way to avoid this error? I need to get rid of the spaces and commas in 'pattern'. Thanks and apologies in advance if the question is not properly formulated. I'm a beginner.
