[Python-Dev] Adding bytes.frombuffer() constructor to PEP 467

Serhiy Storchaka storchaka at gmail.com
Fri Jan 6 16:43:28 EST 2017

Previous message (by thread): [Python-Dev] Adding bytes.frombuffer() constructor to PEP 467
Next message (by thread): [Python-Dev] Adding bytes.frombuffer() constructor to PEP 467
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 06.01.17 21:31, Alexander Belopolsky wrote:
> On Thu, Jan 5, 2017 at 5:54 PM, Serhiy Storchaka <storchaka at gmail.com
> <mailto:storchaka at gmail.com>> wrote:
>
>     On 05.01.17 22:37, Alexander Belopolsky wrote:
>
>         2. For 3.7, I would like to see a drastically simplified bytes(x):
>         2.1.  Accept only objects with a __bytes__ method or a sequence
>         of ints
>         in range(256).
>         2.2.  Expand __bytes__ definition to accept optional encoding
>         and errors
>         parameters.  Implement str.__bytes__(self, [encoding[, errors]]).
>
>
>     I think it is better to use the encode() method if you want to
>     encode from non-strings.
>
>
> Possibly, but the goal of my proposal is to lighten the logic in the
> bytes(x, [encoding[, errors]])
> constructor.  If it detects x.__bytes__, it should just call it with
> whatever arguments are given.

I think this would complicate the __bytes__ protocol. I don't know 
precedences of passing additional optional arguments to a special 
method. int() doesn't pass the base argument to __int__, str() doesn't 
pass encoding and errors to __str__, and pickle.dumps() passes the 
protocol argument to new special method __reduce_ex__ instead of __reduce__.

>     bytes.frombuffer(x) is bytes(memoryview(x)) or memoryview(x).tobytes().
>
>
> I've just tried Inada's patch < http://bugs.python.org/issue29178
> <http://bugs.python.org/issue29178>>:
>
> $ ./python.exe -m timeit -s "from array import array; x=array('f', [0])"
> "bytes..frombuffer(x)"
> 2000000 loops, best of 5: 134 nsec per loop
>
> $ ./python.exe -m timeit -s "from array import array; x=array('f', [0])"
> "with memoryview(x) as m: bytes(m)"
> 500000 loops, best of 5: 436 nsec per loop
>
> A 3x speed-up seems to be worth it.

There is a constant overhead for calling functions. It is dwarfen by 
memory copying for large arrays. I'm not sure that 300 ns is worth 
adding new method.

>         2.4. Implement memoryview.__bytes__ method so that
>         bytes(memoryview(x))
>         works ad before.
>         2.5.  Implement a fast bytearray.__bytes__ method.
>
>
>     This wouldn't help for the bytearray constructor. And wouldn't allow
>     to avoid double copying in the constructor of bytes subclass.
>
>
> I don't see why bytearray constructor should behave differently from bytes.

bytes constructor can just return the result of __bytes__. bytearray 
constructor needs to do a double copying if support __bytes__. First 
copy a data to a bytes object returned by __bytes__, then copy it's 
content to the newly created bytearray object. Creating a bytearray 
object using the buffer protocol needs only one copying.

Perhaps this is the cause why the support of __bytes__ was not added in 
bytearray constructor after all.

> Compare these two calls:
>
>>>> from array import array
>>>> bytes(array('h', [1, 2, 3]))
> b'\x01\x00\x02\x00\x03\x00'
>
> and
>
>>>> bytes(array('f', [1, 2, 3]))
> b'\x00\x00\x80?\x00\x00\x00@\x00\x00@@'

I don't see a difference.

> For me the __bytes__ method is a way for types to specify their bytes
> representation that may or may not be the same as memoryview(x).tobytes().

It would be confusing if some type that supports the buffer protocol 
would implement __bytes__ returning a result different from 
memoryview(x).tobytes(). If you want to get b'\1\2\3' from array('h', 
[1, 2, 3]), use bytes(list(x)).

Previous message (by thread): [Python-Dev] Adding bytes.frombuffer() constructor to PEP 467
Next message (by thread): [Python-Dev] Adding bytes.frombuffer() constructor to PEP 467
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list