Python Forum
sorting a lisr of file paths
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
sorting a lisr of file paths
#1
there are different ways to do sorting in Python. but which way is correct? i have experienced order errors in the past but did not have time to explore why.

i will be sorting file paths where i do expect the '/' that delimits directory names plus a few control characters but specifically NOT the 4 characters with ASCII codes, in decimal, 28 to 31 inclusive. so what i do is translate '/' (decimal code 47) to decimal code 31 (octal 037, hexadecimal 1f) before sorting (using the "sort" command) and translating back afterwards. this should be straight forward in Python.

the above method through the sort command in Linux does work and gets the correct order. i would like to find a command/script in Python that carries out sorting and achieves the same order. if it does not have the translations for '/', then i can add that easy enough. i will look and see what method it used and applied that to my future code.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
(Nov-03-2025, 03:36 AM)Skaperen Wrote: the above method through the sort command in Linux does work and gets the correct order. i would like to find a command/script in Python that carries out sorting and achieves the same order.

So how Linux sorts? What is the "correct" order of these:

dir2/file10
dir100/file5
dir100/file20

Python has subprocess and by using run or Popen it is possible to invoke Linux sort.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#3
I use natsort for sorting filepaths

from pathlib import Path
from natsort import natsorted

# Define the target directory
directory = Path("/path/to/directory")

# Get and naturally sort directory contents
sorted_items = natsorted(directory.iterdir(), key=lambda x: x.name)

# Print sorted items
for item in sorted_items:
    print(item)
Reply
#4
You should use pathlib, because it respects the differences between Windows and Linux/Mac. Windows-Paths are case-insensitive, which means the file "Aa" == "aa". Linux/Mac is case-sensitive.

The abstraction uses the member _parts_normcase. On Windows, the paths are normalized. This also affects the sorting.
Skaperen and Larz60+ like this post
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#5
i forgot about the sorting difference needed in a Windows environment.

the 3 lines perfringo posted would be reversed in a Linux sort

dir100/file20
dir100/file5
dir2/file10

because numbers are just treated as a sequence of characters by sort.

i have implemented code that would sort them numerically, up to a limit. the limit i implemented was 10**20. what the code did was expand a sequence of digits by padding them with leading zeros to form a 20 digit number. if the original data had leading zeros, they would be lost in the post filter that removed leading zeros. i came up with a way to preserve the original prefix but never tried to implement it. i never tried to work out how to handle scientific notations mixed in there (that could be fun).

the issue of the '/' in my question is actually moot. either a '/' compares to a '/' or it doesn't, while comparing for sort. where name segments are the same lengths, then the '/' is only compared to a '/', but the name segments differ or there is a duplicate. if '/' (as translated to something lower than a printable character) compares to a non-'/' then whatever precedes it dominates the order at that point. so my whole question is moo
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#6
The nice thing about Python, you could inspect everything which is written in Python.

>>> import pathlib
>>> import inspect
>>> p = pathlib.Path("/foo/bar/file")
>>> print(inspect.getsource(p.__lt__))
    def __lt__(self, other):
        if not isinstance(other, PurePath) or self.parser is not other.parser:
            return NotImplemented
        return self._parts_normcase < other._parts_normcase

>>> p._parts_normcase
['', 'foo', 'bar', 'file']
If you sort an iterable, the comparison Method __lt__ (less than) is used.
The path separators / or \ are not compared at all, because they don't appear in the list, which is used for comparison. If you want to dig deeper how pathlib works internally, you could look here: https://github.com/python/cpython/blob/m..._init__.py
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#7
If you want to keep the code a bit more 'Pythonic' without the manual translation, you might want to look into pathlib. Using sorted(paths, key=lambda p: Path(p).parts) often handles directory-level sorting naturally because it treats the path as a sequence of components rather than just a raw string.
Skaperen likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  check for duplicate file paths Skaperen 4 2,924 Jun-21-2022, 06:29 AM
Last Post: Gribouillis
  paths for venv Skaperen 5 3,644 Jun-14-2022, 10:04 AM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020