Some information about locale (was Re: [Python-Dev] repr vs. str and locales again)
Peter Funk
pf@artcom-gmbh.de
Mon, 22 May 2000 15:02:18 +0200 (MEST)
Hi!
[...]
[me]:
> > So this simply works well as intended without having to add calls
> > to 'setlocale' to all application program using this C-library functions.
[Guido van Rossum]:
> I don;t believe that. According to the ANSI standard, a C program
> *must* call setlocale(LC_..., "") if it wants the environment
> variables to be honored; without this call, the locale is always the
> "C" locale, which should *not* honor the environment variables.
pf@pefunbk> python
Python 1.5.2 (#1, Jul 23 1999, 06:38:16) [GCC egcs-2.91.66 19990314/Linux (egcs- on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import string
>>> print string.upper("�")
�
>>>
This was the vanilla Python 1.5.2 as originally delivered by SuSE Linux.
But yes, you are right. :-( My memory was confused by this practical
experience. Now I like to quote from the man pages here:
man toupper:
[...]
BUGS
The details of what constitutes an uppercase or lowercase
letter depend on the current locale. For example, the
default "C" locale does not know about umlauts, so no con�
version is done for them.
In some non - English locales, there are lowercase letters
with no corresponding uppercase equivalent; the German
sharp s is one example.
man setlocale:
[...]
A program may be made portable to all locales by calling
setlocale(LC_ALL, "" ) after program initialization, by
using the values returned from a localeconv() call for
locale - dependent information and by using strcoll() or
strxfrm() to compare strings.
[...]
CONFORMING TO
ANSI C, POSIX.1
Linux (that is, libc) supports the portable locales "C"
and "POSIX". In the good old days there used to be sup�
port for the European Latin-1 "ISO-8859-1" locale (e.g. in
libc-4.5.21 and libc-4.6.27), and the Russian "KOI-8"
(more precisely, "koi-8r") locale (e.g. in libc-4.6.27),
so that having an environment variable LC_CTYPE=ISO-8859-1
sufficed to make isprint() return the right answer. These
days non-English speaking Europeans have to work a bit
harder, and must install actual locale files.
[...]
In recent Linux distributions almost every Linux C-program seems to
contain this obligatory 'setlocale(LC_ALL, "");' line, so it's easy
to forget about it. However the core Python interpreter does not.
it seems the Linux C-Library is not fully ANSI compliant in this case.
It seems to honour the setting of $LANG regardless whether a program
calls 'setlocale' or not.
Regards, Peter