Issue2179
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008-02-24 23:10 by jyasskin, last changed 2022-04-11 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| faster_with.patch | jyasskin, 2008-03-03 00:29 | proof of concept | ||
| faster_with.patch | jyasskin, 2008-03-03 02:34 | complete | ||
| Messages (10) | |||
|---|---|---|---|
| msg62951 - (view) | Author: Jeffrey Yasskin (jyasskin) * ![]() |
Date: 2008-02-24 23:10 | |
Currently, 'with' costs about .2us over try/finally:
$ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'lock.acquire()' 'try: pass' 'finally:
lock.release()'
1000000 loops, best of 3: 0.617 usec per loop
$ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'with lock: pass'
1000000 loops, best of 3: 0.774 usec per loop
Since it's doing the same thing (and calling the same C functions to do
the lock manipulation), it shouldn't take more time. The bytecodes
associated with the two options look like:
2) with lock:
3) pass
2 0 LOAD_GLOBAL 0 (lock)
3 DUP_TOP
4 LOAD_ATTR 1 (__exit__)
7 STORE_FAST 0 (_[1])
10 LOAD_ATTR 2 (__enter__)
13 CALL_FUNCTION 0
16 POP_TOP
17 SETUP_FINALLY 4 (to 24)
3 20 POP_BLOCK
21 LOAD_CONST 0 (None)
>> 24 LOAD_FAST 0 (_[1])
27 DELETE_FAST 0 (_[1])
30 WITH_CLEANUP
31 END_FINALLY
32 LOAD_CONST 0 (None)
35 RETURN_VALUE
6) lock.acquire()
7) try:
8) pass
9) finally:
10) lock.release()
6 0 LOAD_GLOBAL 0 (lock)
3 LOAD_ATTR 1 (acquire)
6 CALL_FUNCTION 0
9 POP_TOP
7 10 SETUP_FINALLY 4 (to 17)
8 13 POP_BLOCK
14 LOAD_CONST 0 (None)
10 >> 17 LOAD_GLOBAL 0 (lock)
20 LOAD_ATTR 2 (release)
23 CALL_FUNCTION 0
26 POP_TOP
27 END_FINALLY
28 LOAD_CONST 0 (None)
31 RETURN_VALUE
The major difference I see is the extra local variable (_[1]) used by
with. It looks like the compiler ought to be able to save that on the
stack instead, and save 3 opcodes. Neal Norwitz suggested that, if that
optimization is impossible, WITH_CLEANUP could be modified to take the
variable as a parameter, which would let it absorb the LOAD_FAST and
DELETE_FAST instructions.
I've added everyone on the previous bug to the nosy list. Sorry if you
don't care. :)
|
|||
| msg63165 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2008-03-01 16:24 | |
A closer approximation of what the with statement is doing would be:
exit = lock.release()
lock.acquire()
try:
pass
finally:
exit()
The problem with trying to store the result of the retrieval of __exit__
on the stack is that we need to leave the context manager itself on top
of the stack for the next LOAD_ATTR opcode (when we're retrieving the
__enter__ method).
However, changing WITH_CLEANUP to take an argument indicating which
local variable holds the bound __exit__ method sounds like it might work.
|
|||
| msg63167 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2008-03-01 16:34 | |
Scratch the parentheses on that first line of sample code in my previous comment (isn't copy and paste wonderful?) |
|||
| msg63197 - (view) | Author: Jeffrey Yasskin (jyasskin) * ![]() |
Date: 2008-03-03 00:29 | |
Here's a proof-of-concept patch that keeps the __exit__ method on the
stack. It uses ROT_TWO to stuff it under the context object instead of
storing it into a temporary. (Thanks Nick for pointing out that problem
before I had to waste time on it.) test_with passes, although I need to
update several more things and maybe fix a refleak.
The patch changes the compilation of:
def with_(l):
with l:
pass
from
4 0 LOAD_FAST 0 (l)
3 DUP_TOP
4 LOAD_ATTR 0 (__exit__)
7 STORE_FAST 1 (_[1])
10 LOAD_ATTR 1 (__enter__)
13 CALL_FUNCTION 0
16 POP_TOP
17 SETUP_FINALLY 4 (to 24)
5 20 POP_BLOCK
21 LOAD_CONST 0 (None)
>> 24 LOAD_FAST 1 (_[1])
27 DELETE_FAST 1 (_[1])
30 WITH_CLEANUP
31 END_FINALLY
32 LOAD_CONST 0 (None)
35 RETURN_VALUE
to
4 0 LOAD_FAST 0 (l)
3 DUP_TOP
4 LOAD_ATTR 0 (__exit__)
7 ROT_TWO
8 LOAD_ATTR 1 (__enter__)
11 CALL_FUNCTION 0
14 POP_TOP
15 SETUP_FINALLY 4 (to 22)
5 18 POP_BLOCK
19 LOAD_CONST 0 (None)
>> 22 WITH_CLEANUP
23 END_FINALLY
24 LOAD_CONST 0 (None)
27 RETURN_VALUE
And speeds it up from:
$ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'with lock: pass'
1000000 loops, best of 3: 0.832 usec per loop
to:
$ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'with lock: pass'
1000000 loops, best of 3: 0.762 usec per loop
That's only half of the way to parity with try/finally:
$ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'lock.acquire()' 'try: pass' 'finally:
lock.release()'
1000000 loops, best of 3: 0.638 usec per loop
What's strange is that calling __enter__ and __exit__ in a try/finally
block brings the speed back to the faster 'with' speed, even though they
call the same C functions:
$ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'lock.__enter__()' 'try: pass' 'finally:
lock.__exit__()'
1000000 loops, best of 3: 0.754 usec per loop
Any ideas?
|
|||
| msg63200 - (view) | Author: Jeffrey Yasskin (jyasskin) * ![]() |
Date: 2008-03-03 02:34 | |
Now with documentation, a working test_compile, and one less refleak. |
|||
| msg63323 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * ![]() |
Date: 2008-03-06 15:53 | |
> What's strange is that calling __enter__ and __exit__ in a > try/finally block brings the speed back to the faster 'with' speed, > even though they call the same C functions Looking carefully at the code, there are two reasons for this: - LockType has no methods! try "dir(thread.LockType)". Instead, LockType defines a tp_getattr which does a *linear* search in a PyMethodDef array. First items are served faster... - After converting this to use the usual tp_methods slot, there is still a performance difference, due to the fact that release() is a METH_NOARGS method, while __exit__() uses METH_VARARGS. Phew. |
|||
| msg63353 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2008-03-07 12:01 | |
Patch applied cleanly for me and all tests pass. It also looked good on a visual scan over the diff text. |
|||
| msg63355 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2008-03-07 14:17 | |
I went ahead and committed the change to the bytecode generation as r61290. The deficiencies in the lock implementation should probably be raised as a separate issue, so I'm closing this one. |
|||
| msg63356 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * ![]() |
Date: 2008-03-07 14:43 | |
Hm, my tests do not see any speedup with this patch. I used VS2005 on win2K, and VS2008 on winXP. Timings are very similar before and after this patch. Maybe the optimization is only useful with gcc? |
|||
| msg63425 - (view) | Author: Jeffrey Yasskin (jyasskin) * ![]() |
Date: 2008-03-09 17:03 | |
Thanks Nick and Amaury!
Amaury, what times are you seeing? It could be a just-gcc speedup, but I
wouldn't have thought so since the patch eliminates 2 times around the
eval loop. It could be that the overhead of WITH_CLEANUP is high enough
to drown out those iterations. You had mentioned optimizing the
PyObject_CallFunctionObjArgs() call before. If you're still seeing with
significantly slower than try, that's probably the right place to look.
Here are my current timings. To avoid the lock issues, I wrote
simple_cm.py containing:
class CM(object):
def __enter__(self):
pass
def __exit__(self, *args):
pass
$ ./python.exe -m timeit -s 'import simple_cm; cm = simple_cm.CM()'
'with cm: pass'
1000000 loops, best of 3: 0.885 usec per loop
$ ./python.exe -m timeit -s 'import simple_cm; cm = simple_cm.CM()'
'cm.__enter__()' 'try: pass' 'finally: cm.__exit__()'
1000000 loops, best of 3: 0.858 usec per loop
If __exit__ doesn't contain *args (making it not a context manager), the
try/finally time goes down to:
$ ./python.exe -m timeit -s 'import simple_cm; cm = simple_cm.CM()'
'cm.__enter__()' 'try: pass' 'finally: cm.__exit__()'
1000000 loops, best of 3: 0.619 usec per loop
I think in theory, with could be slightly faster than finally with the
same argument list, but it's pretty close now.
|
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:56:31 | admin | set | github: 46432 |
| 2008-03-09 17:03:40 | jyasskin | set | messages: + msg63425 |
| 2008-03-08 00:08:03 | jcea | set | nosy: + jcea |
| 2008-03-07 14:43:02 | amaury.forgeotdarc | set | messages: + msg63356 |
| 2008-03-07 14:17:51 | ncoghlan | set | status: open -> closed resolution: accepted messages: + msg63355 |
| 2008-03-07 12:01:48 | ncoghlan | set | messages: + msg63353 |
| 2008-03-06 15:54:00 | amaury.forgeotdarc | set | messages: + msg63323 |
| 2008-03-03 15:27:57 | belopolsky | set | nosy: + belopolsky |
| 2008-03-03 02:34:12 | jyasskin | set | files:
+ faster_with.patch messages: + msg63200 |
| 2008-03-03 00:29:46 | jyasskin | set | files:
+ faster_with.patch keywords: + patch messages: + msg63197 |
| 2008-03-01 16:34:03 | ncoghlan | set | messages: + msg63167 |
| 2008-03-01 16:24:17 | ncoghlan | set | nosy:
+ ncoghlan messages: + msg63165 |
| 2008-02-25 00:56:30 | collinwinter | set | nosy: + collinwinter |
| 2008-02-24 23:10:40 | jyasskin | create | |
