unzip should use encoding according to locale, not utf-8
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
unzip (Debian) |
Confirmed
|
Unknown
|
|||
unzip (Ubuntu) |
Confirmed
|
Wishlist
|
Unassigned |
Bug Description
As ZIP files doesn't include information on the encoding of the filenames, most of ZIP archivers use native(system) encoding for it. This is why the ZIP files archived on Windows can't be unarchived on Linux. For example, Korean version of Windows uses 'cp949(extended euc-kr)' encoding to zip and unzip the files. Japanese version of Windows uses 'shift-jis', and so on.
Recently, encoding selection options are added to unzip. Two of them can be controlled by environment variables.
export UNZIP='-O cp949'
export ZIPINFO='-O cp949'
These settings let unzip use cp949 instead of utf-8, the native linux encoding, and improve compatibility with Windows.
So I propose that Ubuntu should include the settings above according its locale. If the system uses ko_KR.UTF-8, cp949 should be selected. For ja_JP.UTF-8, shift-jis should be used. zh_CN and other locales also can be configured.
Changed in unzip: | |
importance: | Undecided → Wishlist |
status: | New → Confirmed |
Changed in unzip (Debian): | |
status: | Unknown → Confirmed |
Here is a sample file. '한국어.zip' containing '한국어.txt' in cp949 encoding, archived on Windows.