The goal is to remove erroneous . from a string, i.e. to go from this:
"I a.m so pl.ea.sed to me.et y.ou. I ho.pe .tha.t th.is is t.he st.a.rt of a lo.n.g fri.en.dsh.ip."
to this:
"I am so pleased to meet you. I hope that this is the start of a long friendship."
The current code is:
"I a.m so pl.ea.sed to me.et y.ou. I ho.pe .tha.t th.is is t.he st.a.rt of a lo.n.g fri.en.dsh.ip."
to this:
"I am so pleased to meet you. I hope that this is the start of a long friendship."
The current code is:
import re
#create function to find, evaluate, and remove random "."
def dot_hunt(text):
#find full stop "." marks and notate their relative location in the string
full_stop_locations=[]
for i in range(len(text)):
lttr=text[i]
if lttr == ".":
full_stop_locations.append(i)
else:
continue
print("Full stop locations are: " + str(full_stop_locations))
text2 = text.replace('.', '')
print(text2)
for i in range(len(text2)):
substring = text2[i-1:i+2]
if re.match(r"^[a-z]\s[A-Z]+$", substring):
re.sub(r"\\s", r".\\s", text2[i])
print(substring)
else:
continue
print(text2)
#create variable to contain the string of output text from the OCR
text = input("Please insert the output text from the OCR: \n>")
#create a variable which makes a function call and receives the returned text
new_text = dot_hunt(text)Thank you for the help in advance.
