[solved] re.split issue

paul18fr · (This post was last modified: Nov-13-2025, 05:06 PM by paul18fr.)

Hi,
It is a basic question but i'm stuck:

one basckslash is ignored and i do not understand why (see output)
the backslash with a number after (\15 here)

As you can imagine, pathes can come from from any os and have ever recorded using os.walk.

Thanks for your contribution

line = 'C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT'
reg = re.split('/|\\\\', line) 
print(reg)

Output:
['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml\rxxx', 'TTTT']

Axel_Erfurt · Nov-13-2025, 01:47 PM

You can try raw string

import re

line = r'C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT'
reg = re.split('/|\\\\', line) 
print(reg)

Output:
['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml', '15xxx', 'TTTT']

paul18fr · Nov-13-2025, 02:01 PM

Thanks Axel

However since i'm using a list, how do you proceed with the following ?

line = 'C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT'
myList = [line, line]
for arg in myList:
    arg = r'%s',arg
    reg = re.split('/|\\\\', arg)
    print(reg

Output:['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml\rxxx', 'TTTT']
['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml\rxxx', 'TTTT']

noisefloor · Nov-13-2025, 02:58 PM

Hi,

general advise: os.walk is legacy, you may want to take a look at the newer pathlib module, which is Python's standard for dealing with directory paths and filenames. pathlib has more and better option, especially with regards to dealing with Windows and Unix path as well as splitting full paths into subsections.

Regards, noisefloor

paul18fr · Nov-13-2025, 05:04 PM

@noisefloor: ok thanks for the feedback; after investigations and tests, the following snipet does excatly what i'm looking for and no regex is needed.

from pathlib import Path
tree = []
for pathObj, dirnames, filenames in Path(f"{workingDir}").walk():
  for file in filenames:
        if file == workingsFile:
            tree.append(pathObj.parts)

Axel_Erfurt · Nov-13-2025, 06:39 PM

(Nov-13-2025, 02:01 PM)paul18fr Wrote: Thanks Axel

However since i'm using a list, how do you proceed with the following ?

line = 'C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT'
myList = [line, line]
for arg in myList:
    arg = r'%s',arg
    reg = re.split('/|\\\\', arg)
    print(reg

Output:['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml\rxxx', 'TTTT']
['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml\rxxx', 'TTTT']

use r'{}'.format

import re

line = r'C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT'
myList = [line, line]
for arg in myList:
    reg = re.split('/|\\\\', r'{}'.format(arg))
    print(reg)

Output:['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml', '15xxx', 'TTTT']
['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml', '15xxx', 'TTTT']

**deanhystad** · Nov-13-2025, 06:42 PM

Python does not have "raw" strings, python programs may have raw string literals. When python parses:

line = "C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT"

It creates a str with any escape sequences in the string literal converted to characters. In your example, \15 is interpreted as an escape sequence and it replaces \15 with the ansii character 13 (octal 15) which is a non-printable character, Carriage Return. It is interesting that this shows up as \r in your output, but even that is a lie. There is no \r in line either, just a Carriage Return that gets printed out as "\r".

The only time you need to worry about raw is when typing string literals into a program. A backslash in a file path returned by os.walk is just a backslash. To hopefully make this clear, I modified your program to create a str that contains single backslashes.

import re

line = "C:\\XXXX\\YYYY/./OOOOO\\mlmlml\\15xxx\\TTTT"
print(line)
reg = re.split("/|\\\\", line)
print(reg)

Output:C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT
['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml', '15xxx', 'TTTT']

You can see in the output that the '\\' in the string literal are converted to '\' in line. Split works as expected, splitting line at each / or \.

Pedroski55 · (This post was last modified: Nov-14-2025, 01:47 AM by Pedroski55.)

Moral of the story is: Do not create paths that have escape sequences in them. That is asking for trouble!

import regex

line = 'C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT'
# manually altering the string to give this works as deanhystad showed, but you don't want to do that for many paths
# we want to automatically alter the string, but that is trés difficile!
line = "C:\\XXXX\\YYYY/./OOOOO\\mlmlml\\15xxx\\TTTT"

# Confucious say: "Wise geek not make path with escape sequence in."
# put anything in [] that might be found in a file or folder name
f  = regex.compile(r'[\w\.-]+')
line_good = 'C:\XXXX\YYYY/./OOOOO\mlmlml\xxx15\TTTT' # still \x causes problems
# SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 27-28: truncated \xXX escape
line_better = 'C:\XXXX\YYYY/./OOOOO\mlmlml\yxx15\TTTT' # no problem

res = f.findall(line_better)
print(res)

Gives:

Output:
['C', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml', 'yxx15', 'TTTT']

paul18fr · Nov-14-2025, 07:25 AM

Thanks all for the feedbacks and the explanation to figure out the escape problem.

However keep in mind the code i provided is a (very basic) test case i made to reproduce the issue; as i said, pathes came from os.walk and i haven't the hand on; the best solution is to use pathlib as mentioned by noisefloor

DeaD_EyE · Nov-14-2025, 11:44 AM

I don't know why my post is gone, but you want to use pathlib.
Often regex is not the best choice to solve a problem. Sometimes there are already well tested mature function/classes to do the same.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[solved] regex issue	paul18fr	3	611	Oct-19-2025, 11:21 PM Last Post: Pedroski55
	[split] Issue installing selenium	Akshat_Vashisht	1	2,518	Oct-18-2023, 02:08 PM Last Post: Larz60+
	[split] Very basic coding issue	aary	4	4,185	Jun-03-2020, 11:59 AM Last Post: buran
	[split] Is there any issue related to path defined somewhere	purnima1	2	3,760	Sep-05-2018, 06:28 AM Last Post: purnima1

[solved] re.split issue

User Panel Messages

Announcements