Posts: 327
Threads: 82
Joined: Apr 2019
Nov-13-2025, 01:28 PM
(This post was last modified: Nov-13-2025, 05:06 PM by paul18fr.)
Hi,
It is a basic question but i'm stuck: - one basckslash is ignored and i do not understand why (see output)
- the backslash with a number after (\15 here)
As you can imagine, pathes can come from from any os and have ever recorded using os.walk.
Thanks for your contribution
line = 'C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT'
reg = re.split('/|\\\\', line)
print(reg)Output: ['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml\rxxx', 'TTTT']
Posts: 1,066
Threads: 17
Joined: Dec 2016
You can try raw string
import re
line = r'C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT'
reg = re.split('/|\\\\', line)
print(reg)Output: ['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml', '15xxx', 'TTTT']
Posts: 327
Threads: 82
Joined: Apr 2019
Thanks Axel
However since i'm using a list, how do you proceed with the following ?
line = 'C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT'
myList = [line, line]
for arg in myList:
arg = r'%s',arg
reg = re.split('/|\\\\', arg)
print(regOutput: ['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml\rxxx', 'TTTT']
['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml\rxxx', 'TTTT']
Posts: 232
Threads: 0
Joined: Jun 2019
Hi,
general advise: os.walk is legacy, you may want to take a look at the newer pathlib module, which is Python's standard for dealing with directory paths and filenames. pathlib has more and better option, especially with regards to dealing with Windows and Unix path as well as splitting full paths into subsections.
Regards, noisefloor
Gribouillis likes this post
Posts: 327
Threads: 82
Joined: Apr 2019
@noisefloor: ok thanks for the feedback; after investigations and tests, the following snipet does excatly what i'm looking for and no regex is needed.
from pathlib import Path
tree = []
for pathObj, dirnames, filenames in Path(f"{workingDir}").walk():
for file in filenames:
if file == workingsFile:
tree.append(pathObj.parts)
noisefloor likes this post
Posts: 1,066
Threads: 17
Joined: Dec 2016
(Nov-13-2025, 02:01 PM)paul18fr Wrote: Thanks Axel
However since i'm using a list, how do you proceed with the following ?
line = 'C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT'
myList = [line, line]
for arg in myList:
arg = r'%s',arg
reg = re.split('/|\\\\', arg)
print(regOutput: ['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml\rxxx', 'TTTT']
['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml\rxxx', 'TTTT']
use r'{}'.format
import re
line = r'C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT'
myList = [line, line]
for arg in myList:
reg = re.split('/|\\\\', r'{}'.format(arg))
print(reg)Output: ['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml', '15xxx', 'TTTT']
['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml', '15xxx', 'TTTT']
Posts: 6,981
Threads: 22
Joined: Feb 2020
Python does not have "raw" strings, python programs may have raw string literals. When python parses:
line = "C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT" It creates a str with any escape sequences in the string literal converted to characters. In your example, \15 is interpreted as an escape sequence and it replaces \15 with the ansii character 13 (octal 15) which is a non-printable character, Carriage Return. It is interesting that this shows up as \r in your output, but even that is a lie. There is no \r in line either, just a Carriage Return that gets printed out as "\r".
The only time you need to worry about raw is when typing string literals into a program. A backslash in a file path returned by os.walk is just a backslash. To hopefully make this clear, I modified your program to create a str that contains single backslashes.
import re
line = "C:\\XXXX\\YYYY/./OOOOO\\mlmlml\\15xxx\\TTTT"
print(line)
reg = re.split("/|\\\\", line)
print(reg)Output: C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT
['C:', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml', '15xxx', 'TTTT']
You can see in the output that the '\\' in the string literal are converted to '\' in line. Split works as expected, splitting line at each / or \.
Posts: 1,300
Threads: 151
Joined: Jul 2017
Nov-14-2025, 01:47 AM
(This post was last modified: Nov-14-2025, 01:47 AM by Pedroski55.)
Moral of the story is: Do not create paths that have escape sequences in them. That is asking for trouble!
import regex
line = 'C:\XXXX\YYYY/./OOOOO\mlmlml\15xxx\TTTT'
# manually altering the string to give this works as deanhystad showed, but you don't want to do that for many paths
# we want to automatically alter the string, but that is trés difficile!
line = "C:\\XXXX\\YYYY/./OOOOO\\mlmlml\\15xxx\\TTTT"
# Confucious say: "Wise geek not make path with escape sequence in."
# put anything in [] that might be found in a file or folder name
f = regex.compile(r'[\w\.-]+')
line_good = 'C:\XXXX\YYYY/./OOOOO\mlmlml\xxx15\TTTT' # still \x causes problems
# SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 27-28: truncated \xXX escape
line_better = 'C:\XXXX\YYYY/./OOOOO\mlmlml\yxx15\TTTT' # no problem
res = f.findall(line_better)
print(res) Gives:
Output: ['C', 'XXXX', 'YYYY', '.', 'OOOOO', 'mlmlml', 'yxx15', 'TTTT']
Posts: 327
Threads: 82
Joined: Apr 2019
Thanks all for the feedbacks and the explanation to figure out the escape problem.
However keep in mind the code i provided is a (very basic) test case i made to reproduce the issue; as i said, pathes came from os.walk and i haven't the hand on; the best solution is to use pathlib as mentioned by noisefloor
Posts: 2,198
Threads: 12
Joined: May 2017
I don't know why my post is gone, but you want to use pathlib.
Often regex is not the best choice to solve a problem. Sometimes there are already well tested mature function/classes to do the same.
|