Python Forum
My regex function is not good
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
My regex function is not good
#1
I'm a novice at programming. One small-scale project I'm working on is an apt history log reader with regex designed to make the output easier. At some point I'll incorporate the output into a Tkinter UI.

The regex itself works, but I think the code could be better. The current code:
import re

file_path = "/var/log/apt/history.log" # Hardcoded as the filename and location never changes.

semicolon_regex = r'(: )'
comma_regex = r'(, )'

def text_replace(file_path = "/var/log/apt/history.log"):
    with open(file_path, "r") as log:
        text = log.read()
        ntext = re.sub(comma_regex, "\n\t", text) # Replace comma + space with newline + tab
        ptext = re.sub(semicolon_regex, ":\n\t", ntext) # Replace semicolon + space with semicolon + newline + tab.
        print(ptext)

text_replace() # This will be a function call in a separate file later on.
I notice that I have two separate variables, each for a different pattern:

ntext
for replacing "comma + single blank space" with "newline + tab".

ptext
for replacing "semicolon + single blank space" with "semicolon + newline + tab".

Major rule in programming: don't repeat yourself, right? But I'm searching for two different patterns and formatting one pattern in one way, and formatting another pattern in a different way.

I'm not at all familiar with Python's regex functionality and everything I've searched for simply confuses me.
Reply
#2
Something like this perhaps? (I just borrowed and modified a line from a csv in an old question.)

import re

line = '"id";"An,rede";"Vor,name";"Nach,name";"E,mail";"Stra,sse";"Nu,mmer";"Ort";"PLZ";"Bundes,land";"Kommen,tare";"Unter,stuetzer";"reg_date"\n"'
newline = re.sub('[;,]', 'XY', line)
newline looks like:

Output:
'"id"XY"AnXYrede"XY"VorXYname"XY"NachXYname"XY"EXYmail"XY"StraXYsse"XY"NuXYmmer"XY"Ort"XY"PLZ"XY"BundesXYland"XY"KommenXYtare"XY"UnterXYstuetzer"XY"reg_date"\n"'
In a regex you can put things in square brackets. Then you will find the things in the square brackets.

Look here for info.
Reply
#3
Regex is great, but it does have some sharp edges and I've had to debug problems in production code caused by it. So I'd prefer to use regular string replacement when possible. So my first thought would be:

document = "some, things to look; for with semi;colons; and com,mas, inside."
replacements = {
    ", ": "\n\t",
    "; ": ";\n\t",
}

for old, new in replacements.items():
    document = document.replace(old, new)

print(document)
Output:
some things to look; for with semi;colons; and com,mas inside.
I think this is better as long as the number of replacements is on the smaller side (and as long as the document is on the smaller side). If you couldn't do straight character checks but needed the power of re, then you could do that almost identically for a simple enhancement. Just make sure that your patterns and documents can't cause pathological behavior.

import re
document = "some, things to look; for with semi;colons; and com,mas, inside."
replacements = {
    r", ": "\n\t",
    r"; ": ";\n\t",
}

for pattern, replacement in replacements.items():
    document = re.sub(pattern, replacement, document)

print(document)
(There are ways of doing different substitutions in a document with a single call to re.sub(), but you usually don't want the added complexity for that....)
woooee likes this post
Reply
#4
There is also possibility to chain replace (it's implemented in C and should be highly optimized, espicially compared to re):

>>> text = "a b cdeab"
>>> text.replace("a ", "XX").replace("b ", "YY")
'XXYYcdeab'
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#5
(Nov-26-2025, 08:47 AM)perfringo Wrote: There is also possibility to chain replace (it's implemented in C and should be highly optimized, espicially compared to re):

>>> text = "a b cdeab"
>>> text.replace("a ", "XX").replace("b ", "YY")
'XXYYcdeab'

OP here (after an email mishap), I decided to go with this solution.

My working program:

def text_replace(file_path = "history_test.log"):
    with open(file_path, "r") as log:
        text = log.read()
        newText = text.replace(": ", ":\n\t").replace(", ", "\n\t")
        print(newText)

text_replace()
Much better.

Thank you all for the help!
Reply
#6
I think you're not doing anything wrong. Using two regex replacements is fine and does not break DRY, because they are two different patterns with two different outputs.

If you want it short, the simplest good version is:

import re

def text_replace(file_path="/var/log/apt/history.log"):
    with open(file_path) as log:
        text = log.read()
        text = re.sub(r', ', '\n\t', text)
        text = re.sub(r': ', ':\n\t', text)
        print(text)

text_replace()
Trying to force one regex here just makes the code hard to read.
our gaming project:- TAG game
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python: Regex is not good for re.search (AttributeError: 'NoneType' object has no att Melcu54 9 5,284 Jun-28-2023, 11:13 AM
Last Post: Melcu54
  Regex - Pass Flags as a function argument? muzikman 6 7,974 Sep-06-2021, 03:43 PM
Last Post: muzikman
  [regex] Good way to parse variable number of items? Winfried 4 4,086 May-15-2020, 01:54 PM
Last Post: Winfried

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020