Message326680
For reference in future discussions, Python's base64 module implements RFC 3548 (https://tools.ietf.org/html/rfc3548) whose section 2.3 (https://tools.ietf.org/html/rfc3548#section-2.3) discusses about "Interpretation of non-alphabet characters in encoded data".
The section's content is:
Base encodings use a specific, reduced, alphabet to encode binary
data. Non alphabet characters could exist within base encoded data,
caused by data corruption or by design. Non alphabet characters may
be exploited as a "covert channel", where non-protocol data can be
sent for nefarious purposes. Non alphabet characters might also be
sent in order to exploit implementation errors leading to, e.g.,
buffer overflow attacks.
Implementations MUST reject the encoding if it contains characters
outside the base alphabet when interpreting base encoded data, unless
the specification referring to this document explicitly states
otherwise. Such specifications may, as MIME does, instead state that
characters outside the base encoding alphabet should simply be
ignored when interpreting data ("be liberal in what you accept").
Note that this means that any CRLF constitute "non alphabet
characters" and are ignored. Furthermore, such specifications may
consider the pad character, "=", as not part of the base alphabet
until the end of the string. If more than the allowed number of pad
characters are found at the end of the string, e.g., a base 64 string
terminated with "===", the excess pad characters could be ignored.
In my opinion, the RFC is rather permissive about strange characters in the encoded data. The RFC refers to the MIME specification that ignores the data and hints the possibility of rejecting the pad symbol '=' unless it is found in the end of the string.
I think that our best option if we would like to address this issue is to add an 'errors' argument whose default value will keep the current behavior for backwards compatibility but will accept more options in order to both ignore the strange characters and carry on with the processing - like bytes.decode's errors=ignore flag - and to raise an error in such situations, like bytes.decode's errors=strict. |
|
| Date |
User |
Action |
Args |
| 2018-09-29 16:05:02 | fbidu | set | recipients:
+ fbidu, pw.michael.harris |
| 2018-09-29 16:05:02 | fbidu | set | messageid: <1538237102.91.0.545547206417.issue34832@psf.upfronthosting.co.za> |
| 2018-09-29 16:05:02 | fbidu | link | issue34832 messages |
| 2018-09-29 16:05:02 | fbidu | create | |
|