[Python-Dev] Adding bytes.frombuffer() constructor to PEP 467
Serhiy Storchaka
storchaka at gmail.com
Fri Jan 6 16:43:28 EST 2017
On 06.01.17 21:31, Alexander Belopolsky wrote:
> On Thu, Jan 5, 2017 at 5:54 PM, Serhiy Storchaka <storchaka at gmail.com
> <mailto:storchaka at gmail.com>> wrote:
>
> On 05.01.17 22:37, Alexander Belopolsky wrote:
>
> 2. For 3.7, I would like to see a drastically simplified bytes(x):
> 2.1. Accept only objects with a __bytes__ method or a sequence
> of ints
> in range(256).
> 2.2. Expand __bytes__ definition to accept optional encoding
> and errors
> parameters. Implement str.__bytes__(self, [encoding[, errors]]).
>
>
> I think it is better to use the encode() method if you want to
> encode from non-strings.
>
>
> Possibly, but the goal of my proposal is to lighten the logic in the
> bytes(x, [encoding[, errors]])
> constructor. If it detects x.__bytes__, it should just call it with
> whatever arguments are given.
I think this would complicate the __bytes__ protocol. I don't know
precedences of passing additional optional arguments to a special
method. int() doesn't pass the base argument to __int__, str() doesn't
pass encoding and errors to __str__, and pickle.dumps() passes the
protocol argument to new special method __reduce_ex__ instead of __reduce__.
> bytes.frombuffer(x) is bytes(memoryview(x)) or memoryview(x).tobytes().
>
>
> I've just tried Inada's patch < http://bugs.python.org/issue29178
> <http://bugs.python.org/issue29178>>:
>
> $ ./python.exe -m timeit -s "from array import array; x=array('f', [0])"
> "bytes..frombuffer(x)"
> 2000000 loops, best of 5: 134 nsec per loop
>
> $ ./python.exe -m timeit -s "from array import array; x=array('f', [0])"
> "with memoryview(x) as m: bytes(m)"
> 500000 loops, best of 5: 436 nsec per loop
>
> A 3x speed-up seems to be worth it.
There is a constant overhead for calling functions. It is dwarfen by
memory copying for large arrays. I'm not sure that 300 ns is worth
adding new method.
> 2.4. Implement memoryview.__bytes__ method so that
> bytes(memoryview(x))
> works ad before.
> 2.5. Implement a fast bytearray.__bytes__ method.
>
>
> This wouldn't help for the bytearray constructor. And wouldn't allow
> to avoid double copying in the constructor of bytes subclass.
>
>
> I don't see why bytearray constructor should behave differently from bytes.
bytes constructor can just return the result of __bytes__. bytearray
constructor needs to do a double copying if support __bytes__. First
copy a data to a bytes object returned by __bytes__, then copy it's
content to the newly created bytearray object. Creating a bytearray
object using the buffer protocol needs only one copying.
Perhaps this is the cause why the support of __bytes__ was not added in
bytearray constructor after all.
> Compare these two calls:
>
>>>> from array import array
>>>> bytes(array('h', [1, 2, 3]))
> b'\x01\x00\x02\x00\x03\x00'
>
> and
>
>>>> bytes(array('f', [1, 2, 3]))
> b'\x00\x00\x80?\x00\x00\x00@\x00\x00@@'
I don't see a difference.
> For me the __bytes__ method is a way for types to specify their bytes
> representation that may or may not be the same as memoryview(x).tobytes().
It would be confusing if some type that supports the buffer protocol
would implement __bytes__ returning a result different from
memoryview(x).tobytes(). If you want to get b'\1\2\3' from array('h',
[1, 2, 3]), use bytes(list(x)).
More information about the Python-Dev
mailing list