[Python-Dev] Unicode strings as filenames
Martin v. Loewis
martin@v.loewis.de
Thu, 3 Jan 2002 22:52:19 +0100
> What's the correct way to deal with filenames in a Unicode environment?
> Consider this:
>=20
> >>> import site
> >>> site.encoding
> 'latin-1'
Setting site.encoding is certainly the wrong thing to do. How can you
know all users of your system use latin-1?
> If I change my site's default encoding back to ascii, the second open fai=
ls:
>=20
> >>> import site
> >>> site.encoding
> 'ascii'
> >>> a =3D "abc\xe4\xfc\xdf.txt"
> >>> u =3D unicode (a, "latin-1")
On my system, the following works fine
>>> import locale
>>> locale.setlocale(locale.LC_ALL,"")
'LC_CTYPE=3Dde_DE;LC_NUMERIC=3Dde_DE;LC_TIME=3Dde_DE;LC_COLLATE=3DC;LC_MONE=
TARY=3Dde_DE;LC_MESSAGES=3Dde_DE;LC_PAPER=3Dde_DE;LC_NAME=3Dde_DE;LC_ADDRES=
S=3Dde_DE;LC_TELEPHONE=3Dde_DE;LC_MEASUREMENT=3Dde_DE;LC_IDENTIFICATION=3Dd=
e_DE'
>>> a =3D "abc\xe4\xfc\xdf.txt"
>>> u =3D unicode (a, "latin-1")
>>> open(u, "w")
<open file 'abc=E4=FC=DF.txt', mode 'w' at 0x8173e88>
On Unix, your best bet for file names is to trust the user's locale
settings. If you do that, open will accept Unicode objects.
What is your locale?
> Is that the correct approach? Apparently Python's file object doesn't do
> this under the covers. Should it?
No. There is no established convention, on Unix, how to do non-ASCII
file names. If anything, following the user's locale setting is the
most reasonable thing to do; this should be in synch of how the user's
terminal displays characters. The Python installations' default
encoding is almost useless, and shouldn't be changed.
On Windows, things are much better, since there a notion of Unicode
file names in the system.
Regards,
Martin