This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tim.peters
Recipients JelleZijlstra, tim.peters
Date 2022-03-12.03:24:49
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1647055489.9.0.896345569782.issue46990@roundup.psfhosted.org>
In-reply-to
Content
Well, that's annoying ;-) In context, the OP was saving a list of 10 million splits. So each overallocation by a single element burned 80 million bytes of RAM. Overallocating by 7 burned 560 million bytes.

Which is unusual. Usually a split result is short-lived, consumed once then thrown away.

OTOH, the overwhelming motivation for overallocating at all is to acheive O(1) amortized time after a long _sequence_ of appends, and split results typically aren't appended to at all. split() appears to be using it as a timing micro-optimization for tiny lists instead.

So, like I said, it's annoying ;-) For "small" lists, split() really shouldn't overallocate at all (because, as before, split results are rarely appended to). A compromise could be to save pointers to the first N (12, whatever) instances of the splitting string in a stack ("auto") vector, before any list object (or result string object) is created. If it's out of stuff to do before reaching N, fine, build a result out of exactly what was found. If there's more to do, build a result from the first N, and go on as currently (letting PyList_Append deal with it - overallocation is huge in percentage terms when the list is short, but not so much as the list gets longer).
History
Date User Action Args
2022-03-12 03:24:49tim.peterssetrecipients: + tim.peters, JelleZijlstra
2022-03-12 03:24:49tim.peterssetmessageid: <1647055489.9.0.896345569782.issue46990@roundup.psfhosted.org>
2022-03-12 03:24:49tim.peterslinkissue46990 messages
2022-03-12 03:24:49tim.peterscreate