Ubuntu
unzip package

unzip should use encoding according to locale, not utf-8

Bug #203609 reported by Barosl LEE on 2008-03-18

This bug report is a duplicate of: Bug #580961: unzip fails to deal correctly with filename encodings. Edit Remove

This bug affects 9 people

Affects		Status	Importance	Assigned to	Milestone
	unzip (Debian)	Confirmed	Unknown	debbugs #483290
	unzip (Ubuntu)	Confirmed	Wishlist	Unassigned

Bug Description

As ZIP files doesn't include information on the encoding of the filenames, most of ZIP archivers use native(system) encoding for it. This is why the ZIP files archived on Windows can't be unarchived on Linux. For example, Korean version of Windows uses 'cp949(extended euc-kr)' encoding to zip and unzip the files. Japanese version of Windows uses 'shift-jis', and so on.

Recently, encoding selection options are added to unzip. Two of them can be controlled by environment variables.

export UNZIP='-O cp949'
export ZIPINFO='-O cp949'

These settings let unzip use cp949 instead of utf-8, the native linux encoding, and improve compatibility with Windows.

So I propose that Ubuntu should include the settings above according its locale. If the system uses ko_KR.UTF-8, cp949 should be selected. For ja_JP.UTF-8, shift-jis should be used. zh_CN and other locales also can be configured.

Revision history for this message

Barosl LEE (barosl) wrote on 2008-03-18:

한국어.zip Edit (124 bytes, application/zip)

Here is a sample file. '한국어.zip' containing '한국어.txt' in cp949 encoding, archived on Windows.

Revision history for this message

Barosl LEE (barosl) wrote on 2008-03-18:

日本語.zip Edit (126 bytes, application/zip)

Here is a sample file. '日本語.zip' containing '日本語.txt' in shift-jis encoding, archived on Japanese version Windows.

Emmet Hikory (persia) on 2008-03-20

Changed in unzip:
importance:	Undecided → Wishlist
status:	New → Confirmed

Revision history for this message

Emmet Hikory (persia) wrote on 2008-03-20:

Some work towards fixing this appears as part of the solution to bug #10979. Perhaps the definition of a greater number of encoding matches around line 1700 in unix/unix.c would help to increase the number of supported encodings.

Revision history for this message

Dmitry Agafonov (dmitry-agafonov) wrote on 2010-04-22:

I guess we should make this bug as duplicate of https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/477755

Bug Watch Updater (bug-watch-updater) on 2010-05-06