Comment 25 for bug 10979

Revision history for this message
Vladimir Skvortsov (vskvortsoff) wrote :

Ubuntu 12.10 (UI with US English-UTF-8 codepage)

It seems if you KNOW from which SW platform zip file comes from and codepage, you can successfully unzip the archive without loosing non-ASCII filenames not encoded in UTF-8.

I just did one experiment to unpack zip file that has been created in Korean Windows 7 and contains the Korean characters in both zip archive name and compressed files.

First let's get a local-specific info:

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Let's check the version of unzip utility:

$ unzip --help
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.
...
Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]
Default action is to extract files in list, except those in xlist, to exdir;
file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for usage).
...
-O CHARSET specify a character encoding for DOS, Windows and OS/2 archives
-I CHARSET specify a character encoding for UNIX and other archives

Look at options with the following modifier:

-O CHARSET specify a character encoding for DOS, Windows and OS/2 archives

It is not -"zero", it is -O (capital O letter)!

In my case Korean Windows has EUC-KR codepage. The compressed zip-file has "2013년 설날" file name.

It means my command line will look like:

$ unzip -O EUC-KR "2013년 설날"

After checking unpacked files, it works! All files have right Korean encoding without strange characters.