Ubuntu
unzip package

unzip does not support UTF-8 filenames

Bug #10979 reported by Young-Ho Cha on 2004-12-06

This bug report is a duplicate of: Bug #580961: unzip fails to deal correctly with filename encodings. Edit Remove

102

This bug affects 20 people

	Status	Importance	Assigned to
unzip (Debian)	Confirmed	Unknown	debbugs #197428
unzip (Gentoo Linux)	Fix Released	Unknown	gentoo-bugs #69945
unzip (Mandriva)	Confirmed	Unknown	mandriva #57965
unzip (Ubuntu)	Confirmed	Medium	Matthias Klose

Bug Description

when unzip extract filename , unzip handle with 7 bit filename.
so filenames with non-latin1 characters are broken.

I described in gentoo bugzilla #69945. and reported zip-bug form.

http://bugs.gentoo.org/show_bug.cgi?id=69945: http://bugs.gentoo.org/show_bug.cgi?id=69945

See original description

Revision history for this message

In Debian Bug tracker #197428, Santiago Vila Doncel (sanvila-unex) wrote on 2003-06-15: Bug#197428: unzip: zipinfo (and unzip) can't deal with chinese filenames like miniunzip can (fwd)

Hello.

Another report from Dan Jacobson:

---------- Forwarded message ----------
From: Dan Jacobson <email address hidden>
Date: Sun, 15 Jun 2003 03:55:28 +0800
Subject: Bug#197428: unzip: zipinfo (and unzip) can't deal with chinese
filenames like miniunzip can

Package: unzip
Version: 5.50-1
Severity: normal
File: /usr/bin/zipinfo
Tags: upstream

miniunzip extracts this archive from a windows system perfectly, but
as we can see, zipinfo gets the file names wrong.

07:37 liuyujiao$ miniunzip ~/kejia/liuyujiao/%AAF%B6%D5%A6r%A8%E5-1.zip
MiniUnz 0.15, demo of zLib + Unz package written by Gilles Vollant
more info at http://wwww.winimage.com/zLibDll/unzip.html

/home/jidanni/kejia/liuyujiao/%AAF%B6%D5%A6r%A8%E5-1.zip opened
extracting: ªF¶Õ¦r¨å-1/G±Æ§Ç.doc
extracting: ªF¶Õ¦r¨å-1/z±Æ§Ç.doc
extracting: ªF¶Õ¦r¨å-1/M±Æ§Ç.doc
extracting: ªF¶Õ¦r¨å-1/N±Æ§Ç.doc
extracting: ªF¶Õ¦r¨å-1/S±Æ§Ç.doc
extracting: ªF¶Õ¦r¨å-1/I±Æ§Ç.doc
extracting: ªF¶Õ¦r¨å-1/O±Æ§Ç.doc
extracting: ªF¶Õ¦r¨å-1/P±Æ§Ç.doc
extracting: ªF¶Õ¦r¨å-1/H±Æ§Ç.doc
creating directory: ªF¶Õ¦r¨å-1/
07:38 liuyujiao$ zipinfo %AAF%B6%D5%A6r%A8%E5-1.zip
Archive: %AAF%B6%D5%A6r%A8%E5-1.zip 1453161 bytes 10 files
-rw-rw-rw- 2.0 fat 2091008 b- defN 16-Mar-03 14:26 ¬FÂiªr¿Õ-1/G¦ãºÃ.doc
-rw-rw-rw- 2.0 fat 1638912 b- defN 14-May-03 17:57 ¬FÂiªr¿Õ-1/z¦ãºÃ.doc
-rw-rw-rw- 2.0 fat 1190912 b- defN 27-Apr-03 14:43 ¬FÂiªr¿Õ-1/M¦ãºÃ.doc
-rw-rw-rw- 2.0 fat 1346560 b- defN 3-May-03 11:25 ¬FÂiªr¿Õ-1/N¦ãºÃ.doc
-rw-rw-rw- 2.0 fat 2365952 b- defN 16-May-03 09:45 ¬FÂiªr¿Õ-1/S¦ãºÃ.doc
-rw-rw-rw- 2.0 fat 1546240 b- defN 21-Apr-03 18:42 ¬FÂiªr¿Õ-1/I¦ãºÃ.doc
-rw-rw-rw- 2.0 fat 154112 b- defN 7-May-03 09:27 ¬FÂiªr¿Õ-1/O¦ãºÃ.doc
-rw-rw-rw- 2.0 fat 729600 b- defN 7-May-03 10:39 ¬FÂiªr¿Õ-1/P¦ãºÃ.doc
-rw-rw-rw- 2.0 fat 1233408 b- defN 22-Apr-03 10:31 ¬FÂiªr¿Õ-1/H¦ãºÃ.doc
drwxrwxrwx 2.0 fat 0 b- stor 19-May-03 15:08 ¬FÂiªr¿Õ-1/
10 files, 12296704 bytes uncompressed, 1451997 bytes compressed: 88.2%

-- System Information:
Debian Release: testing/unstable
Architecture: i386
Kernel: Linux debian 2.4.20-k7 #1 Tue Jan 14 00:29:06 EST 2003 i686
Locale: LANG=zh_TW.Big5, LC_CTYPE=zh_TW.Big5

Versions of packages unzip depends on:
ii libc6 2.3.1-10 GNU C Library: Shared libraries an

-- no debconf information

Hello.

Another report from Dan Jacobson:

---------- Forwarded message ----------
From: Dan Jacobson <jidanni@jidanni.org>
Date: Sun, 15 Jun 2003 03:55:28 +0800
Subject: Bug#197428: unzip: zipinfo (and unzip) can't deal with chinese
    filenames like miniunzip can

Package: unzip
Version: 5.50-1
Severity: normal
File: /usr/bin/zipinfo
Tags: upstream

miniunzip extracts this archive from a windows system perfectly, but
as we can see, zipinfo gets the file names wrong.

07:37 liuyujiao$ miniunzip ~/kejia/liuyujiao/%AAF%B6%D5%A6r%A8%E5-1.zip
MiniUnz 0.15, demo of zLib + Unz package written by Gilles Vollant
more info at http://wwww.winimage.com/zLibDll/unzip.html

/home/jidanni/kejia/liuyujiao/%AAF%B6%D5%A6r%A8%E5-1.zip opened
 extracting: ªF¶Õ¦r¨å-1/G±Æ§Ç.doc
 extracting: ªF¶Õ¦r¨å-1/z±Æ§Ç.doc
 extracting: ªF¶Õ¦r¨å-1/M±Æ§Ç.doc
 extracting: ªF¶Õ¦r¨å-1/N±Æ§Ç.doc
 extracting: ªF¶Õ¦r¨å-1/S±Æ§Ç.doc
 extracting: ªF¶Õ¦r¨å-1/I±Æ§Ç.doc
 extracting: ªF¶Õ¦r¨å-1/O±Æ§Ç.doc
 extracting: ªF¶Õ¦r¨å-1/P±Æ§Ç.doc
 extracting: ªF¶Õ¦r¨å-1/H±Æ§Ç.doc
creating directory: ªF¶Õ¦r¨å-1/
07:38 liuyujiao$ zipinfo %AAF%B6%D5%A6r%A8%E5-1.zip
Archive:  %AAF%B6%D5%A6r%A8%E5-1.zip   1453161 bytes   10 files
-rw-rw-rw-  2.0 fat  2091008 b- defN 16-Mar-03 14:26 ¬FÂiªr¿Õ-1/G¦ãºÃ.doc
-rw-rw-rw-  2.0 fat  1638912 b- defN 14-May-03 17:57 ¬FÂiªr¿Õ-1/z¦ãºÃ.doc
-rw-rw-rw-  2.0 fat  1190912 b- defN 27-Apr-03 14:43 ¬FÂiªr¿Õ-1/M¦ãºÃ.doc
-rw-rw-rw-  2.0 fat  1346560 b- defN  3-May-03 11:25 ¬FÂiªr¿Õ-1/N¦ãºÃ.doc
-rw-rw-rw-  2.0 fat  2365952 b- defN 16-May-03 09:45 ¬FÂiªr¿Õ-1/S¦ãºÃ.doc
-rw-rw-rw-  2.0 fat  1546240 b- defN 21-Apr-03 18:42 ¬FÂiªr¿Õ-1/I¦ãºÃ.doc
-rw-rw-rw-  2.0 fat   154112 b- defN  7-May-03 09:27 ¬FÂiªr¿Õ-1/O¦ãºÃ.doc
-rw-rw-rw-  2.0 fat   729600 b- defN  7-May-03 10:39 ¬FÂiªr¿Õ-1/P¦ãºÃ.doc
-rw-rw-rw-  2.0 fat  1233408 b- defN 22-Apr-03 10:31 ¬FÂiªr¿Õ-1/H¦ãºÃ.doc
drwxrwxrwx  2.0 fat        0 b- stor 19-May-03 15:08 ¬FÂiªr¿Õ-1/
10 files, 12296704 bytes uncompressed, 1451997 bytes compressed:  88.2%

-- System Information:
Debian Release: testing/unstable
Architecture: i386
Kernel: Linux debian 2.4.20-k7 #1 Tue Jan 14 00:29:06 EST 2003 i686
Locale: LANG=zh_TW.Big5, LC_CTYPE=zh_TW.Big5

Versions of packages unzip depends on:
ii  libc6                         2.3.1-10   GNU C Library: Shared libraries an

-- no debconf information

Revision history for this message

Young-Ho Cha (ganadist-gmail) wrote on 2004-12-06:

when unzip extract filename , unzip handle with 7 bit filename.
so filenames with non-latin1 characters are broken.

I described in gentoo bugzilla #69945. and reported zip-bug form.

Revision history for this message

In Debian Bug tracker #197428, Matthias Klose (doko-cs) wrote on 2005-02-12: tagging unzip reports + patch

tags 197427 + patch
tags 197428 + patch
thanks

patch proposal at http://ftp.mizi.com/~ganadist/unzip-locale.diff

Revision history for this message

Matthias Klose (doko) wrote on 2005-03-04:

patch found at http://ftp.mizi.com/~ganadist/unzip-locale.diff Edit (6.3 KiB, text/plain)

Created an attachment (id=1511)
patch found at http://ftp.mizi.com/~ganadist/unzip-locale.diff

Revision history for this message

Matthias Klose (doko) wrote on 2005-03-04:

the patch gets the encoding of the filenames in the archive from the current
locale (LANG), which is most likely correct, if the archives stay in the same
country. But you get wrong results extracting an archive from Russia in Japan.

Are these cases uncommon enough, and we should apply the patch anyway?

Revision history for this message

Martin Bergner (martin-bergner) wrote on 2006-03-28:

what is the progress on this, is the patch in there?

Revision history for this message

Rocco Stanzione (trappist) wrote on 2006-08-29:

Closing after several months without a reply. If this is still a problem, please reply or reopen the bug.

Changed in unzip:
status:	Needs Info → Rejected

Revision history for this message

Matthias Klose (doko) wrote on 2006-08-29:

reopening, bogus reason for rejection. did you even see the report as confirmed for upstream?

Changed in unzip:
status:	Rejected → Confirmed

Revision history for this message

Rocco Stanzione (trappist) wrote on 2006-08-29:

The policy is to reject a bug after more than a month of inactivity with a status of 'needs info'. Thank you for your attention.

Revision history for this message

Matthias Klose (doko) wrote on 2006-08-29:

#10

please add a reference to that policy.

Revision history for this message

Roland Ronquist (roland-ronquist) wrote on 2007-02-02:

#11

As far as I can see the bug is still there, and zip is showing the exact same behaviour as unzip so maybe we should file yet another bug or stick to killing 2 birds with one stone.

Maybe even a better suggestion is to go through all
packages and figure out what is needed for UTF-8
compatibility since this none-awareness of UTF-8
is not only displayed by simple command line utilities
(like zip and unzip) but also by user oriented packages
like Abiword.

Everything may seem to work as long as you never
ever exchange files with someone on a different
character encoding (has anyone heard of Microsoft
Windows?!).

Revision history for this message

Pavel Volkovitskiy (int) wrote on 2007-03-27:

#12

i found working patch in altlinux bugzilla

https://bugzilla.altlinux.org/long_list.cgi?buglist=4871

sorry, it in russian only, most important part is:

patch that is works with utf-8
https://bugzilla.altlinux.org/attachment.cgi?id=1402

+ -O CHARSET specify a character encoding for DOS, Windows and OS/2 archives\n\
+ -I CHARSET specify a character encoding for UNIX and other archives\n\n";

also it will be possible to add to /etc/profile.d/unzip.sh
# Set default encoding for filenames inside DOS/Windows Zip archives
export UNZIP="-O CP866"
export ZIPINFO="-O CP866"

so this will be used by default

Revision history for this message

Matthias Klose (doko) wrote on 2007-03-31:

#13

fixed in unzip_5.52-9ubuntu3

Changed in unzip:
status:	Confirmed → Fix Released

Bug Watch Updater (bug-watch-updater) on 2008-08-05

Changed in unzip:
status:	Unknown → Invalid

Bug Watch Updater (bug-watch-updater) on 2009-02-26

Changed in unzip:
status:	Invalid → Fix Released

Revision history for this message

Martin Bergner (martin-bergner) wrote on 2009-03-05:

#14

Hi, if there is a patch, why is the bug still open? It is resolved in gentoo and I don't know how to reproduce it, can someone who has experienced the problem comment on that?

Revision history for this message

Martin Bergner (martin-bergner) wrote on 2009-03-05:

#15

Ahh, misread the in debian line, it was fixed correctly in Ubuntu. Sorry for the noise!

Revision history for this message

Kjetil Torgrim Homme (kjetilho) wrote on 2009-05-24:

#16

The behaviour in 5.52-10ubuntu2 is still broken. I have a file with filenames in UTF-8, but if I try to unpack it in a UTF-8 locale, the filenames get corrupted.

Example:
ó is stored in the file as \303\263 (raw UTF-8), but is extracted as \342\224\234\342\224\202

Workaround:
  using a Latin1 environemtn will make a verbatim copy of the octets:
    env LC_CTYPE=en_US unzip ...
  (obviously you first need to generate a Latin1 environment)

My Ubuntu version is 8.04.2

Revision history for this message

Yinon Ehrlich (yinone) wrote on 2009-07-30:

#17

Hi,

I tried to extract a zip file that created on Hebrew Windows on Ubuntu 8.10, Hebrew locale.
The extracted files got Russian names...

Solved this way:
unzip -O 862 -x ...

However:
* I didn't find any documentation for the -O option, just by trial and error (failed: 1255, windows-1255 and "ISO-8859-8" ).
* Is there any way to (automatically) identify the original Zip file encoding and environment (e.g. Hebrew on Windows) ?

Thanks in advance,
Yinon

Revision history for this message

Andreas Chatziagapiou (xagapiou) wrote on 2009-11-03:

#18

When in 9.04 the two lines in ~/.profile worked just fine (the lines below are for the greek locale):
export UNZIP="-O CP737"
export ZIPINFO="-O CP737"

but now in karmic its again broken.
My ~/.profile remains the same with the two above line appended just as it was with jaunty but karmic doesn't seem to bother :(

Revision history for this message

Lev Abashkin (lev-abashkin) wrote on 2009-12-05:

#19

It looks like Ubuntu maintainers forgot to include patch from Altlinux folks into unzip 6.0. Here it is: http://sisyphus.ru/ru/srpm/Sisyphus/unzip/patches/0

Revision history for this message

proDOOMman (prodoomman) wrote on 2009-12-24:

#20

Please, add natspec or any other patch to the unzip!

Revision history for this message

Murz (murznn) wrote on 2010-01-11:

#21

Same problem on karmic, unzip didn't accept '-o' paramter.
After settings:
export UNZIP="-O CP737"
export ZIPINFO="-O CP737"
unzip command stops accepting arguments

Bug Watch Updater (bug-watch-updater) on 2010-03-03

Changed in unzip (Mandriva):
status:	Unknown → Confirmed

Revision history for this message

Еггог (sergey-nr) wrote on 2010-04-09:

#22

Same problem on Lucid 10.04beta2, unzip don't know '-O' and '-I' parameters! Please return the patch from Altlinux ! In old version unzip 5.52 (Jaunty) no this problem.

Yannis Tsop (ogiannhs) on 2010-04-20

Changed in unzip (Ubuntu):
status:	Fix Released → Confirmed

Revision history for this message

proDOOMman (prodoomman) wrote on 2010-04-20:

#23

WTF? This bug influences not only fileroller, but every archiver in Linux, that use unzip.
This is not a duplicate of bug #177929!

Revision history for this message

Dmitry Agafonov (dmitry-agafonov) wrote on 2010-04-22:

#24

@proDOOMman Right
I guess we should make this bug as duplicate of https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/477755

Revision history for this message

Vladimir Skvortsov (vskvortsoff) wrote on 2013-02-10:

#25

Ubuntu 12.10 (UI with US English-UTF-8 codepage)

It seems if you KNOW from which SW platform zip file comes from and codepage, you can successfully unzip the archive without loosing non-ASCII filenames not encoded in UTF-8.

I just did one experiment to unpack zip file that has been created in Korean Windows 7 and contains the Korean characters in both zip archive name and compressed files.

First let's get a local-specific info:

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Let's check the version of unzip utility:

$ unzip --help
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.
...
Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]
Default action is to extract files in list, except those in xlist, to exdir;
file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for usage).
...
-O CHARSET specify a character encoding for DOS, Windows and OS/2 archives
-I CHARSET specify a character encoding for UNIX and other archives

Look at options with the following modifier:

-O CHARSET specify a character encoding for DOS, Windows and OS/2 archives

It is not -"zero", it is -O (capital O letter)!

In my case Korean Windows has EUC-KR codepage. The compressed zip-file has "2013년 설날" file name.

It means my command line will look like:

$ unzip -O EUC-KR "2013년 설날"

After checking unpacked files, it works! All files have right Korean encoding without strange characters.

Revision history for this message

sae.area (saearea-test) wrote on 2016-04-16:

#26

For anybody coming past here and looking for file-roller issues related to unzip, please check: https://bugzilla.gnome.org/show_bug.cgi?id=306403#c42

Thank you.