This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mscuthbert
Recipients mscuthbert
Date 2016-05-02.04:29:09
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1462163353.47.0.300392305122.issue26904@psf.upfronthosting.co.za>
In-reply-to
Content
The implementation used in difflib.SequenceMatcher().quick_ratio() counts how often each member of the sequence (character, list entry, etc.) appears in order to calculate its lower bound.

Counting how often an entry appears in an iterable has been sped up in cPython w/ the _count_elements() c function in _collections.

This patch uses the collections uses collections.Counter() as a way of getting this speedup (and any future speedups in cPython or other implementations); code is somewhat simplified also.

Performance:
There is a slight overhead to creating two collections.Counter() objects rather than simple dicts.  On two Mac systems (Python 3.5 on stock Macbook Air, and Py 3.6a0 latest on recent Mac Pro) the new implementation passes the speed of the previous when the length of the iterable is around 60 items.  As the number of items increases, the performance gains increase significantly. By 400 items, the new implementation's speed is about 3x the old, and seems to approach 3.6x asymptotically.

Below 60 items, the older implementation is faster; reaching a max of 8x the speed of the new when comparing a string of one element against a string of one element.  (The degenerate case of comparing one or two empty iterables is checked for in this implementation and is thus faster than the old implementation). I believe that users using quick_ratio() are more likely to be looking for speed improvements on larger 

Like the previous implementation, the count of the items in seq2 (self.b) is cached; if run again, it is about 41% faster (compared to 47% faster before).

This is my first patch submission, so any suggestions would be appreciated.
History
Date User Action Args
2016-05-02 04:29:13mscuthbertsetrecipients: + mscuthbert
2016-05-02 04:29:13mscuthbertsetmessageid: <1462163353.47.0.300392305122.issue26904@psf.upfronthosting.co.za>
2016-05-02 04:29:13mscuthbertlinkissue26904 messages
2016-05-02 04:29:11mscuthbertcreate