Message 167760 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	monson
Recipients	monson
Date	2012-08-09.06:20:43
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1344493246.11.0.837013497182.issue15602@psf.upfronthosting.co.za>
In-reply-to

Content
In /cpython/Lib/zipfile.py, there are some codes like if flags & 0x800: # UTF-8 file names extension filename = filename.decode('utf-8') else: # Historical ZIP filename encoding filename = filename.decode('cp437') But actually there is no "Historical ZIP filename encoding", because zip files contain no charset info. In English countries, it's usually not a big deal. But if the files zip on a non-cp437-based system (especially like China or Japan), filename is encoded from charsets like gb18030, but ZipFile decodes the byte stream to cp437, then everything goes wrong and people are hard to find the reason. It's a problem new in py3k, and I found it on python3.2 and python3.4. I suggest the filename returned in Bytes objects, or add decoding parameter when opening zipfile.

In /cpython/Lib/zipfile.py, there are some codes like

            if flags & 0x800:
                # UTF-8 file names extension
                filename = filename.decode('utf-8')
            else:
                # Historical ZIP filename encoding
                filename = filename.decode('cp437')


But actually there is no "Historical ZIP filename encoding", because zip files contain no charset info.
In English countries, it's usually not a big deal. But if the files zip on a non-cp437-based system (especially like China or Japan), filename is encoded from charsets like gb18030, but ZipFile decodes the byte stream to cp437, then everything goes wrong and people are hard to find the reason.

It's a problem new in py3k, and I found it on python3.2 and python3.4.
I suggest the filename returned in Bytes objects, or add decoding parameter when opening zipfile.

History
Date	User	Action	Args
2012-08-09 06:20:46	monson	set	recipients: + monson
2012-08-09 06:20:46	monson	set	messageid: <1344493246.11.0.837013497182.issue15602@psf.upfronthosting.co.za>
2012-08-09 06:20:45	monson	link	issue15602 messages
2012-08-09 06:20:44	monson	create