My regex function is not good

Moltar1997 · (This post was last modified: Nov-25-2025, 04:34 PM by Moltar1997.)

I'm a novice at programming. One small-scale project I'm working on is an apt history log reader with regex designed to make the output easier. At some point I'll incorporate the output into a Tkinter UI.

The regex itself works, but I think the code could be better. The current code:

import re

file_path = "/var/log/apt/history.log" # Hardcoded as the filename and location never changes.

semicolon_regex = r'(: )'
comma_regex = r'(, )'

def text_replace(file_path = "/var/log/apt/history.log"):
    with open(file_path, "r") as log:
        text = log.read()
        ntext = re.sub(comma_regex, "\n\t", text) # Replace comma + space with newline + tab
        ptext = re.sub(semicolon_regex, ":\n\t", ntext) # Replace semicolon + space with semicolon + newline + tab.
        print(ptext)

text_replace() # This will be a function call in a separate file later on.

I notice that I have two separate variables, each for a different pattern:

ntext

for replacing "comma + single blank space" with "newline + tab".

ptext

for replacing "semicolon + single blank space" with "semicolon + newline + tab".

Major rule in programming: don't repeat yourself, right? But I'm searching for two different patterns and formatting one pattern in one way, and formatting another pattern in a different way.

I'm not at all familiar with Python's regex functionality and everything I've searched for simply confuses me.

Pedroski55 · (This post was last modified: Nov-25-2025, 11:52 PM by Pedroski55.)

Something like this perhaps? (I just borrowed and modified a line from a csv in an old question.)

import re

line = '"id";"An,rede";"Vor,name";"Nach,name";"E,mail";"Stra,sse";"Nu,mmer";"Ort";"PLZ";"Bundes,land";"Kommen,tare";"Unter,stuetzer";"reg_date"\n"'
newline = re.sub('[;,]', 'XY', line)

newline looks like:

Output:
'"id"XY"AnXYrede"XY"VorXYname"XY"NachXYname"XY"EXYmail"XY"StraXYsse"XY"NuXYmmer"XY"Ort"XY"PLZ"XY"BundesXYland"XY"KommenXYtare"XY"UnterXYstuetzer"XY"reg_date"\n"'

In a regex you can put things in square brackets. Then you will find the things in the square brackets.

Look here for info.

bowlofred · (This post was last modified: Nov-26-2025, 05:21 AM by bowlofred.)

Regex is great, but it does have some sharp edges and I've had to debug problems in production code caused by it. So I'd prefer to use regular string replacement when possible. So my first thought would be:

document = "some, things to look; for with semi;colons; and com,mas, inside."
replacements = {
    ", ": "\n\t",
    "; ": ";\n\t",
}

for old, new in replacements.items():
    document = document.replace(old, new)

print(document)

Output:some
        things to look;
        for with semi;colons;
        and com,mas
        inside.

I think this is better as long as the number of replacements is on the smaller side (and as long as the document is on the smaller side). If you couldn't do straight character checks but needed the power of re, then you could do that almost identically for a simple enhancement. Just make sure that your patterns and documents can't cause pathological behavior.

import re
document = "some, things to look; for with semi;colons; and com,mas, inside."
replacements = {
    r", ": "\n\t",
    r"; ": ";\n\t",
}

for pattern, replacement in replacements.items():
    document = re.sub(pattern, replacement, document)

print(document)

(There are ways of doing different substitutions in a document with a single call to re.sub(), but you usually don't want the added complexity for that....)

**perfringo** · Nov-26-2025, 08:47 AM

There is also possibility to chain replace (it's implemented in C and should be highly optimized, espicially compared to re):

>>> text = "a b cdeab"
>>> text.replace("a ", "XX").replace("b ", "YY")
'XXYYcdeab'

Mt1997 · Dec-01-2025, 06:04 PM

(Nov-26-2025, 08:47 AM)perfringo Wrote: There is also possibility to chain replace (it's implemented in C and should be highly optimized, espicially compared to re):
>>> text = "a b cdeab"
>>> text.replace("a ", "XX").replace("b ", "YY")
'XXYYcdeab'

OP here (after an email mishap), I decided to go with this solution.

My working program:

def text_replace(file_path = "history_test.log"):
    with open(file_path, "r") as log:
        text = log.read()
        newText = text.replace(": ", ":\n\t").replace(", ", "\n\t")
        print(newText)

text_replace()

Much better.

Thank you all for the help!

SledgeNE · Dec-15-2025, 06:42 AM

I think you're not doing anything wrong. Using two regex replacements is fine and does not break DRY, because they are two different patterns with two different outputs.

If you want it short, the simplest good version is:

import re

def text_replace(file_path="/var/log/apt/history.log"):
    with open(file_path) as log:
        text = log.read()
        text = re.sub(r', ', '\n\t', text)
        text = re.sub(r': ', ':\n\t', text)
        print(text)

text_replace()

Trying to force one regex here just makes the code hard to read.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Python: Regex is not good for re.search (AttributeError: 'NoneType' object has no att	Melcu54	9	5,284	Jun-28-2023, 11:13 AM Last Post: Melcu54
	Regex - Pass Flags as a function argument?	muzikman	6	7,974	Sep-06-2021, 03:43 PM Last Post: muzikman
	[regex] Good way to parse variable number of items?	Winfried	4	4,086	May-15-2020, 01:54 PM Last Post: Winfried

My regex function is not good

User Panel Messages

Announcements