Python Forum
replace or remove text from many text files
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
replace or remove text from many text files
#1
I'm looking to remove quotes " (artifacts from a 90s ASCII file transfer program) from many 1000s of text files (source code), this seemed like something that is up python's alley. I seem to have the marching through the directory structure down, (because I've botched a copy of all files a couple times), generally, when something is giving me trouble, it means I'm approaching this wrong and I've trying the str.replace('", '') method, then chasing errors that arise from this. I have some experience with regular expression, old and of need of dusting off, does anyone have any suggestions?
Reply
#2
If you give an example of what you mean, I'm sure people here can help.

You want to remove " from the file name? Or you want to remove all " from the file text content?
Reply
#3
Pedroski55 thanks and sorry about my vagueness, yes, every occurrence of ("), the old ASCII file transfer program wrapped each text line transferred with quotes. Either every occurrence or at the beginning and end of each line of contents, either way.

here is a small example:
""
";       .OFINI                          ;field definitions"
";       .OFDEF  Fd.Nxt  ,  2.           ;00 offset from PC to next field"
";       .OFDEF  Fd.Max  ,  1.           ;01 maximum input characters"
";       .OFDEF  Fd.Min  ,  1.           ;02 minimum input characters"
";       .OFDEF  Fd.Col  ,  1.           ;10 column position"
";       .OFDEF  Fd.Row  ,  1.           ;11 row position"
";       .OFDEF  Fd.FFg  ,  2.           ;12 field flags"
";               Ff%XX1  = 00.           ;00000 unknown"
";               Ff%NoC  = 01.           ;00002 Non-changable"
";               Ff%Dis  = 02.           ;00004 Display-Only"
";               Ff%ACr  = 03.           ;00008 Auto-Carriage Return"
";               Ff%HE   = 04.           ;00016 Hidden-Entry"
";               Ff%XX2  = 05.           ;00032 unknown"
";               Ff%EEI  = 06.           ;00064 Pre-Edit Interrupt"
";               Ff%OEI  = 07.           ;00128 Post-Edit Interrupt"
";               Ff%CoE  = 08.           ;00256 Clear on Entry"
";               Ff%RPD  = 09.           ;00512 Right Prompt Display"
";               Ff%RTB  = 10.           ;01024 Retian Trailing Blanks"
";               Ff%CVI  = 11.           ;02048 Changed Value Post-Edit Interrupt"
""
";               Ff%NDB  = 12.           ;04096 No Display if Blank"
";               Ff%NPF  = 13.           ;08192 No Prompt Flash"
";               Ff%NEV  = 14.           ;16384 No Expression Value"
";               Ff%XX3  = 15.           ;32768 Unknown"
";       .OFDEF  Fd.Typ  ,  1.           ;14 field type"
";       .OFDEF  Fd.Mod  ,  1.           ;15 field modifiers"
";       .OFDEF  Fd.Spc  ,  1.           ;16 special field modifiers"
";       .OFDEF  Fd.XX2  ,  1.           ;17"
";       .OFDEF  Fd.SDC  ,  1.           ;20 starting display code"
";       .OFDEF  Fd.EDC  ,  1.           ;21 starting display code"
";       .OFDEF  Fd.RdS  ,  1.           ;22 read security"
";       .OFDEF  Fd.WtS  ,  1.           ;23 write security"
";       .OFDEF  Fd.F24  ,  1.           ;24 ???? (cleared by ESP)"
";       .OFDEF  Fd.F24  ,  1.           ;45 ???? (cleared by ESP)"
";       .OFDEF  Fd.BgC  ,  1.           ;26 background color"
";       .OFDEF  Fd.FgC  ,  1.           ;27 foreground color"
";       .OFDEF  Fd.Cls  ,  1.           ;30 field class"
";               Fc%Tit  = 00.           ;00001 Title"
";               Fc%DH   = 01.           ;00002 Detail Heading"
";               Fc%DF   = 02.           ;00004 Detail Field"
";               Fc%ST1  = 03.           ;00008 1st Sub-Total Level"
";               Fc%ST2  = 04.           ;00016 2nd Sub-Total Level"
";               Fc%ST3  = 05.           ;00032 3rd Sub-Total Level"
";               Fc%GT   = 06.           ;00064 Grand-Total"
";               Fc%ROF  = 07.           ;00128 Report Only Field"
";       .OFDEF  Fd.F31  ,  1.           ;31 ???? (cleared by ESP)"
";       .OFDEF  Fd.F32  ,  1.           ;32 ????"
";       .OFDEF  Fd.FN   ,  1.           ;36 field number (loaded by ESP)"
";       .OFDEF  Fd.F42  ,  1.           ;42 ???? (cleared by ESP)"
";       .OFDEF  Fd.F43  ,  1.           ;43 ???? (cleared by ESP)"
Reply
#4
Python nowadays has string methods such as removeprefix and removesuffix, you should be able to apply them here
Reply
#5
if you're under unix/linux you can use the following tools targetting here .py files and looking also in subdirectories:
find . -type f -name "*.py" -exec sed -i 's/text2replace/newText/g' {} +
Be carefull with special character using sed such as "/", "[", "]", etc. .. you must placethe backslash in front of such as "\["

In a python script, you can use it with os.system or better with subprocess
Reply
#6
with open(filename, “r”) as file:
    text = file.read().replace(‘“‘, ‘’)
with open(filename, “w”) as file:
    file.write(text)
Reply
#7
woefdram thanks,

that did it. all files in 12 seconds. Nice.
Reply
#8
Deanhystad,

My recollection of yesterday when I chased this method was, the difference between your example code and what I did was:
tmp = file.read()
text = tmp.replace(‘“‘, ‘’)
the rest was basically the same.
Chased it’s error: https://stackoverflow.com/questions/7721...-codec-can

then it’s redirect: https://stackoverflow.com/questions/4233...ition-0-in

then errors stemming from that suggestion:
with open(path, 'rb') as f:
contents = f.read()

After spending the whole day chasing my tail and turning those files into alphabet soup, the programming ceased being fun, resulting in this thread.

After today’s success I may try to replace line by line instead of to whole file.
At lease the errors are easier for me to debug.
Reply
#9
You said they were text files. Why are you using rb to read.
Reply
#10
Deanhystad,

Because text = tmp.replace(‘“‘, ‘’) popped an "File <frozen codecs>, line 322, in decode UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte [duplicate]" error and searching that error led to:

https://stackoverflow.com/questions/4233...ition-0-in

where the solution with the highest rating was
with open(path, 'rb') as f:
  contents = f.read()
then I had to chase errors produced by that.

I tried loading a string variable from a string literal of the file contents, and that worked fine, but for me at least, loading the variable from file,read() barfed the above <frozen codecs> error.

I'm still learning python and I learn best by doing, it's not that I need these files unquoted, or I can't unquote them some other way as paul18fr pointed out, and after all, those files have been this way for 30 years, this just seemed like a good opportunity to learn more python.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  open a text file using list() Pedroski55 2 114 Feb-25-2026, 06:57 PM
Last Post: noisefloor
  How to Randomly Print a Quote From a Text File When User Types a Command on Main Menu BillKochman 13 7,272 Dec-16-2025, 05:29 PM
Last Post: zodazy
Question [SOLVED] Linefeed when writing "f" strings to text file? Winfried 5 857 Nov-04-2025, 11:51 AM
Last Post: buran
Question Parse Markdown / get the plain text SpongeB0B 8 4,290 Oct-07-2025, 06:14 PM
Last Post: noisefloor
  Picamera2 add text to camera preview GigiG 0 991 May-26-2025, 11:46 AM
Last Post: GigiG
  How can I write formatted (i.e. bold, italic, change font size, etc.) text to a file? JohnJSal 13 36,670 May-20-2025, 12:26 PM
Last Post: hanmen9527
  Paste text with caret already positioned inside a placeholder; Wehaveall 2 1,703 May-14-2025, 01:12 AM
Last Post: armorerratic
  subprocess check_output text cut off Axel_Erfurt 5 1,826 Feb-20-2025, 02:15 PM
Last Post: DeaD_EyE
  Python - Hidden Text / Html Mail python1337 1 4,492 Feb-08-2025, 10:47 AM
Last Post: python1337
  Problems writing a large text file in python Vilius 4 2,033 Dec-21-2024, 09:20 AM
Last Post: Pedroski55

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020