Message 274222 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	ezio.melotti, nnnnnn, serhiy.storchaka, vstinner
Date	2016-09-02.10:19:33
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1472811574.16.0.472581628747.issue27938@psf.upfronthosting.co.za>
In-reply-to

Content
The "us-ascii" encoding is an alias to the Python ASCII encoding. PyUnicode_AsEncodedString() and PyUnicode_Decode() functions have a fast-path for the "ascii" string, but not for "us-ascii". Attached patch uses also the fast-path for "us-ascii". It's a more generic change than the issue #27915. The "us-ascii" name is common in the email and xml.etree modules. Other changes of the patch: * Rewrite _Py_normalize_encoding() as a C implementation of encodings.normalize_encoding(). For example, " utf-8 " is now normalized to "utf_8". So the fast path is now used for more name variants of the same encoding. * Avoid strcpy() when encoding is NULL: call directly the UTF-8 codec * Reorder encodings: UTF-8, ASCII, MBCS, Latin1, UTF-16 * Remove fast-path for UTF-32: seriously, nobody uses this codec. Latin9 is much faster but has no fast-path.

The "us-ascii" encoding is an alias to the Python ASCII encoding. PyUnicode_AsEncodedString() and PyUnicode_Decode() functions have a fast-path for the "ascii" string, but not for "us-ascii".

Attached patch uses also the fast-path for "us-ascii". It's a more generic change than the issue #27915. The "us-ascii" name is common in the email and xml.etree modules.

Other changes of the patch:

* Rewrite _Py_normalize_encoding() as a C implementation of encodings.normalize_encoding(). For example, " utf-8 " is now normalized to "utf_8". So the fast path is now used for more name variants of the same encoding.
* Avoid strcpy() when encoding is NULL: call directly the UTF-8 codec
* Reorder encodings: UTF-8, ASCII, MBCS, Latin1, UTF-16
* Remove fast-path for UTF-32: seriously, nobody uses this codec. Latin9 is much faster but has no fast-path.

History
Date	User	Action	Args
2016-09-02 10:19:34	vstinner	set	recipients: + vstinner, nnnnnn, ezio.melotti, serhiy.storchaka
2016-09-02 10:19:34	vstinner	set	messageid: <1472811574.16.0.472581628747.issue27938@psf.upfronthosting.co.za>
2016-09-02 10:19:34	vstinner	link	issue27938 messages
2016-09-02 10:19:33	vstinner	create