[Python-Dev] Python-3.0, unicode, and os.environ

Adam Olsen rhamph at gmail.com
Sun Dec 7 18:35:53 CET 2008


On Sun, Dec 7, 2008 at 2:35 AM, Hagen Fürstenau <hfuerstenau at gmx.net> wrote:
>>> As far as I can see all Python Unicode strings can be encoded to UTF-8,
>>> even things like lone surrogates because Python doesn't care about them.
>>> So both the Unicode API and the binary API would be fail-safe on Windows.
>>
>> Python is broken and needs to be fixed.
>>
>> http://bugs.python.org/issue3672
>> http://bugs.python.org/issue3297
>
> But the question of whether Python should care about lone surrogates or
> not is at best tangential to the issue at hand.  If you have lone
> surrogates in the Unicode API (and didn't raise an exception on the way
> getting there), then the sensible thing is to encode them into lone
> UTF-8 surrogates.  Even if you wanted to prevent lone surrogates,
> encoding to UTF-8 for the binary API would not be the place to enforce it.

No.  Unicode *requires* them to be treated as errors.  If you want to
pass them through then you're creating a custom encoding... which you
might argue for in this case, but it needs to be clearly separate from
the real UTF-8.


-- 
Adam Olsen, aka Rhamphoryncus


More information about the Python-Dev mailing list