Message304169
Well, the problem in the regexp is this part: "\d+,? ?". You're not _requiring_ that strings of digits be separated by a comma or blank, you're only _allowing_ them to be so separated. A solid string of digits is matched by this, and so the enclosing + requires the engine, when backtracking, to consider every possible way of breaking the solid string of digits into one or more solid strings of digits too. If there are n digits in the digit string, there are 2**(n-1) ways to do so.
Overall the regexp can never match because "NULL" never gets matched. So all possibilities will be tried. We can compute how many attempts will be made like so:
prod = 1
for c in re.findall("\d+|NULL", statement):
if c == "NULL":
break
prod *= 2**(len(c) - 1)
print(format(prod, ","))
Attempting test_01
256
Attempting test_02
4,096
Attempting test_03
65,536
Attempting test_04
1,048,576
Attempting test_05
2,097,152
Attempting test_06
4,194,304
Attempting test_07
8,388,608
Attempting test_08
16,777,216
Attempting test_09
33,554,432
Attempting test_10
67,108,864
Attempting test_11
16,777,216
Attempting test_12
536,870,912
Attempting test_13
17,179,869,184
Attempting test_14
549,755,813,888
Note, e.g., this "predicts" test_08 will take about the same time as test_11 (both require 16,777,216 attempts). Which is what you actually saw happen.
And that's why you gave up on test_14: it will require over half a trillion failing attempts before it ends, which will take roughly 30 times longer than it failed to match test_13.
If we were building a regexp _tester_, we'd spin off a new process to run the regexp, and kill the process if it took "too long". But that's not we're doing - library functions do what you tell them to do ;-)
In this case, it's easily repaired. For example, replace
"\d+,? ?"
by
"\d+(?=[ ,)]),? ?"
This uses a lookahead assertion to _require_ that a digit string is followed by a blank, comma, or right parenthesis. Which prevents exponential-time backtracking in failing cases. |
|
| Date |
User |
Action |
Args |
| 2017-10-11 18:48:56 | tim.peters | set | recipients:
+ tim.peters, ezio.melotti, mrabarnett, Raphaël Riel |
| 2017-10-11 18:48:56 | tim.peters | set | messageid: <1507747736.37.0.213398074469.issue31759@psf.upfronthosting.co.za> |
| 2017-10-11 18:48:56 | tim.peters | link | issue31759 messages |
| 2017-10-11 18:48:55 | tim.peters | create | |
|