[Python-Dev] Make re.compile faster
Barry Warsaw
barry at python.org
Tue Oct 3 10:14:58 EDT 2017
On Oct 3, 2017, at 01:35, Serhiy Storchaka <storchaka at gmail.com> wrote:
>
>> diff --git a/Lib/string.py b/Lib/string.py
>> index b46e60c38f..fedd92246d 100644
>> --- a/Lib/string.py
>> +++ b/Lib/string.py
>> @@ -81,7 +81,7 @@ class Template(metaclass=_TemplateMetaclass):
>> delimiter = '$'
>> idpattern = r'[_a-z][_a-z0-9]*'
>> braceidpattern = None
>> - flags = _re.IGNORECASE
>> + flags = _re.IGNORECASE | _re.ASCII
>> def __init__(self, template):
>> self.template = template
>> patched:
>> import time: 1191 | 8479 | string
>> Of course, this patch is not backward compatible. [a-z] doesn't match with 'ı' or 'ſ' anymore.
>> But who cares?
>
> This looks like a bug fix. I'm wondering if it is worth to backport it to 3.6. But the change itself can break a user code that changes idpattern without touching flags. There is other way, but it should be discussed on the bug tracker.
It’s definitely an API change, as I mention in the bug tracker. It’s *probably* safe in practice given that the documentation does say that identifiers are ASCII by default, but it also means that a client who wants to use Unicode previously didn’t have to touch flags, and after this change would now have to do so. `flags` is part of the public API.
Maybe for subclasses you could say that if delimiter, idpattern, or braceidpattern are anything but the defaults, fall back to just re.IGNORECASE.
Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <http://mail.python.org/pipermail/python-dev/attachments/20171003/ddbe6c8f/attachment.sig>
More information about the Python-Dev
mailing list