unzip fails to deal correctly with filename encodings

Bug #580961 reported by Rolf Leggewie
This bug affects 1461 people
Affects Status Importance Assigned to Milestone
Ubuntu Japanese Kaizen Project
Fix Released
High
Unassigned
unzip (Ubuntu)
Fix Released
Critical
Unassigned
Precise
Fix Released
High
Unassigned
Quantal
Fix Released
High
Unassigned
Raring
Fix Released
High
Unassigned
unzip (openSUSE)
Fix Released
Medium

Bug Description

Binary package hint: unzip

This is a fairly annoying bug that's been around and known at least since 2005. It's very visible as it will very often make exchange of zip files with Windows users impossible, for example. As such, it gathered it's fair share of "me too" and "how dare you haven't fixed this yet!!111!" comments.

Problem description:
zip/unzip and the specification fall short when dealing with non-ASCII filenames not encoded in UTF-8

test case:
do an "unzip -l" on the file http://tinyurl.com/2aofpxs and witness the question marks

affected programs:
the problem is in unzip itself, but affects GUI like xarchiver, file-roller, etc. that rely on unzip for the decompression

suggested solutions (most are workarounds, not proper fixes):
 a) reintroduce patch for codepage-based zip filenames: bug 477755, http://tinyurl.com/2aqdbqg (Ubuntu blueprint)
 b) unzip filename according to locale: bug 203609
 c) Ubuntu JP has a patch, probably not generally applicable, bug 269482
 d) Russian altlinux distro uses natspec lib and patched zip binary

natspec was mentioned in bug 477755 comment #2 and may indeed be a proper fix, needs closer inspection (I haven't really looked, yet. As discussed in https://bugzilla.gnome.org/show_bug.cgi?id=306403 there is no failsafe, straight-forward way to fix this in all cases. Nonetheless, the current situation can and should be improved. There's some good ideas floating around. It needs somebody to pull and wrap them together.

It's unfortunate the FOSS community so far hasn't been able to fix this rather visible problem. I'm opening this ticket as a master bug and clean slate to document the issue and current status. Please don't ruin it by making above-mentioned unhelpful comments, they actually slow things down! Please don't nominate for a release.

Unless you're a dev and can provide a patch, you should think VERY carefully to do anything but

1) subscribe yourself to this ticket
2) mark this bug as affecting you
3) tell me via mail about other bugs you think are a duplicate of this one, discussing the same problem

1) to 3) will showcase to the devs how many people are affected and that is the only real chance we have for somebody to take a serious look. "Me too" comments do the opposite, so again, please don't do it.

Revision history for this message
In , 5-pavel (5-pavel) wrote :

Created attachment 319015
An archive file with cyrillic file names included

User-Agent: Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.0.13) Gecko/2009080200 SUSE/3.0.13-0.1.2 Firefox/3.0.13

There are several discussions about the problem concerning cyrillic filenames in zip archives and unzip package. Unzip out-of-the-box (compiled from sources) does not choose filenames encoding correctly.

Developers from Ark say me, that the error is completely from info-zip project (https://bugs.kde.org/show_bug.cgi?id=204984).

There are sime patches to info-zip's unzip package, that makes unzip extract filenames with correct encoding. But maintainers of info-zip project rejected these patches (http://www.info-zip.org/board/board.pl?m-1248086794).

It would be nice to include this package in main openSuSE distribution.

Reproducible: Always

Steps to Reproduce:
1. Create zip-archive, containing files with cyrillic names under Windows.
2. Try to open it with unzip under SuSE
Actual Results:
Filename encoding is incorrect. Example:

pavel@pavel:~/tmp> unzip ReportPacket_DBV90821CJ.zip
Archive: ReportPacket_DBV90821CJ.zip
  inflating: ???????? ????? (????????).pdf
  inflating: ???????? ????? (??????????).pdf

Expected Results:
Results, produced with natspec patch from sisyphus

pavel@rzn-sepak-bpa:~/backup> pavel@rzn-sepak-bpa:~/temp> unzip ReportPacket_DBV90821CJ.zip
Archive: ReportPacket_DBV90821CJ.zip
  inflating: ????????? ????? (??????????).pdf
  inflating: ????????? ????? (??????????).pdf

Revision history for this message
In , Dvaleev (dvaleev) wrote :

We have found solution. But it requires additional libraries to convert file names on the fly.

The library is librcc especially created for handling non utf encoded file names.

How we can proceed then? RPM packages are built on OBS and tested. Should we create submit request?

The librcc and patched unzip are here:
http://download.opensuse.org/repositories/home:/Lazy_Kent/openSUSE_11.2/

Revision history for this message
In , Pth-3 (pth-3) wrote :

*** Bug 575715 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Anaumov (anaumov) wrote :
Revision history for this message
In , Stian Viskjer (stianvis) wrote :

This is also a problem with the letters 'æ ø å' used in some of the Scandinavian alphabets.

It's also an issue for tar's created by 7zip on Windows.

Unzip 6.0 and the packages from home:/Lazy_Kent/openSUSE_11.2/ mentioned in comment 1 doesn't change anything on my system. (11.2 x86_64)

Revision history for this message
In , Kyrill Detinov (lazy-kent) wrote :

I made a submit request to Factory:
https://build.opensuse.org/request/diff/39326

Confirmed, it works at least with Russian, Czech and Slovak.
http://lizards.opensuse.org/2010/04/07/call-for-testing-unzip-feature/

% LANG=cs_CZ.utf8 unzip -l test-cz.zip
Archive: test-cz.zip
  Length Date Time Name
 -------- ---- ---- ----
      117 03-18-10 15:24 aábcčdďeéěfghchiíjklmnňoópqrřsštťuúůvwxyýzžAÁBCČDĎEÉĚFGHCHIÍJKLMNŇOÓPQRŘSŠTŤUÚŮVWXYÝZŽ.txt
 -------- -------
      117 1 file

Revision history for this message
In , Pth-3 (pth-3) wrote :

I won't accept the patch for openSUSE because upstream doesn't accept it and openSUSE would have to maintain this patch indefinitely. If this or a similiar patch gets accepted upstream I'll help in backporting it.

Revision history for this message
In , Anixx (anixx) wrote :

What about changing the file to Sisyphus' patched version? If openSUSe cannot maintain it, let's Alt Linux team do the maintenance and regard them as upstream of a forked version?

Revision history for this message
In , Anixx (anixx) wrote :

Well it is really annoying: nobody can open archives made under Windows. People of business say Linux is buggy: it even cannot open archives properly. The same say government officials.

Revision history for this message
In , Dvaleev (dvaleev) wrote :

@Philipp
Chances to push this patch to upstream is very small or even not possible at all. Other distributions tried to accomplish that without success.

The upstream statement is: The trend in IT is to use UTF8.
That's why patch is not accepted.

Then why we can't accept this patch as openSUSE specific to close such annoying bug? And maintain it until good time comes. openSUSE maintain a numbers of specific patches for rpm, OpenOffice.org

If you won't maintain patch, please let community to do it.
The patch is small. It introduces new header and changes few strings of main code.

We tested patched unzip for two-three months and it just works. Also we got positive feedback on Czech and Slovak in addition to Russian language.
It also pretty applicable on latest 6.0 unzip version.

Revision history for this message
In , Pth-3 (pth-3) wrote :

OK, after thinking about this I have added the patch to our unzip and will keep it at least as long as the package builds and the patch doesn't need extra work. Kyrilk, would you be willing to act as co-maintainer? Or to ask more more broadly, would anyone of you be willing to comaintain zip/unzip?
I'll also try to get an update for 11.2 out of the door.

Revision history for this message
In , Kyrill Detinov (lazy-kent) wrote :

Philipp, I made sr#39767.

At the moment we have librcc0 in Factory only. So we may build patched unzip against Factory.
I added %if 0%{?suse_version} > 1120 for all the chahges.

> would you be willing to act as co-maintainer?

Yes, I'd like to take this role.

Rolf Leggewie (r0lf)
Changed in unzip (Ubuntu):
importance: Undecided → High
status: New → Triaged
description: updated
Rolf Leggewie (r0lf)
description: updated
description: updated
Rolf Leggewie (r0lf)
description: updated
Rolf Leggewie (r0lf)
description: updated
Changed in unzip (Debian):
status: Unknown → Confirmed
Revision history for this message
Rolf Leggewie (r0lf) wrote :

Martin, do I read the Changelog correctly that you assumed the Ubuntu delta for unzip 5.52-12ubuntu1 only concerned UTF-8 and thus you dropped it when Debian released 6.0-1? In fact, I believe this then introduced a regression that made bug 10979 (or this bug, whichever you prefer ;-)) to return. Actually, unicode is mostly not an issue and dealt with in 6.x. What's remaining problematic is non-UTF-8, non-ASCII filenames in zip files. It's my understanding that's what this has been about all along. I believe we will need to reintroduce a similar patch or look into natspec which seems to be able to deal more elegantly and automatically with the matter.

Changed in ubuntu-jp-improvement:
status: New → In Progress
status: In Progress → Fix Committed
assignee: nobody → Rolf Leggewie (r0lf)
Revision history for this message
Martin Pitt (pitti) wrote :

Hm, the original patch from https://edge.launchpad.net/ubuntu/+source/unzip/5.52-9ubuntu3 talked about "UTF-8 file names" all along. Unfortunately I cannot read the upstream bug, it's in Russian.

So if this was about non-UTF-8, it was very misleading. Sorry for dropping it prematurely then.

Revision history for this message
Rolf Leggewie (r0lf) wrote :

Martin, thank you for your comment. The question is now how to deal with this regression. One way is of course to reintroduce the patch from bug 477755 (not sure if it still applies cleanly). The other option that seems to be a more general solution is the one offered by natspec. I'm currently in the process of working with Andrew Shadoura to try and get the libnatspec package into Debian.

I would think the following is the best course of action.

1) reintroduce (an adapted) patch from bug 477755 for lucid. don't close this ticket.
2) evaluate libnatspec for maverick

What do you think? Can you take care of part 1)?

Revision history for this message
Rolf Leggewie (r0lf) wrote :

failed to mention the reasoning behind my proposal.

unzip is in main. If unzip were to depend on libnatspec it would need to be in main, too. This can not happen before Maverick. But I think we should try to get a fix for Lucid as well, especially since this is a regression from an earlier version and the solution should be reasonably easy to implement.

Revision history for this message
Martin Pitt (pitti) wrote : Re: [Bug 580961] Re: unzip fails to deal correctly with filename encodings

Rolf Leggewie [2010-05-20 10:59 -0000]:
> 1) reintroduce (an adapted) patch from bug 477755 for lucid. don't close this ticket.
> 2) evaluate libnatspec for maverick
>
> What do you think? Can you take care of part 1)?

Sorry, I don't think I can. I'm on rotation for this cycle, and not
working in the platform team. But I'm happy to sponsor a fixed package
(maverick/lucid-proposed).

Thanks,

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Revision history for this message
Rolf Leggewie (r0lf) wrote :

OK, understood.

But I think that's a bit lame. I'm not throwing around blame, but it seems your upload introduced the regression as part of your paid work and now you want others to fix it. I have no beef in this, am not being paid for it, yet I have already volunteered and done all the recent bug work, started to work on evaluating natspec for maverick, etc. Simply because I want the Japanese folks to drop one part of their delta to Ubuntu.

Frankly, I believe you should play your part, being currently on the platform team or not. Especially since it's probably nothing more than taking the old patch, maybe tweaking it slightly, testing and uploading.

Revision history for this message
In , Meissner-i (meissner-i) wrote :

do we really want to take 2 new libraries for 11.2? not sure.

Revision history for this message
In , Bruno-ioda-net (bruno-ioda-net) wrote :

In reply to C12
More & more customers are having incoming zip from differents encodings and it's really a pain to explain, oh this zip should be unzip under window to get the right encoding. We look like charlot.

So as 11.2 as a long life in front of it, yes I'm voting for having it include as fast as possible. The bug start under 11.2, so I feel it's better to close it on 11.2, and be sure it was integrated in 11.3

Or (I'm only seeing ma world part :-) ) there's a much complicated implication, if so it should be explain.

Revision history for this message
Alexander (alexander-v-shinkarenko) wrote :

Problem with curillic in archives

Revision history for this message
Nelson Benitez (gnel) wrote :

Hi Rolf, I think your first comment "Actually, unicode is mostly not an issue and dealt with in 6.x. " is not true.. if I zip utf8 filenames with non-ascii chars (with zip command or file-roller) then I can't unzip them with file-roller.. because 'unzip -l' lists them wrong.

See my comment on upstream bug https://bugzilla.gnome.org/show_bug.cgi?id=619116#c4

So I think this bug still applies for UTF-8 files (those with non-ascii chars).. the problem here is Info-zip doesn't even have a bug-tracker, but the file-roller problem could be solved just installing 7-zip package (again see my upstream comment..).

let me know if I'm wrong in my observations..

Revision history for this message
Matteo Rapone (nedanfor) wrote :

I agree with you, Nelson. Are you sure that 7-zip package solves the problem?

Revision history for this message
In , Cdengler-z (cdengler-z) wrote :

I'm not happy about adding two new libraries to a released product, but in this case I think it should be fine if someone will maintain them. (+1)

Revision history for this message
In , Meissner-i (meissner-i) wrote :

so lets do it. :)

Revision history for this message
In , Swamp-a (swamp-a) wrote :

The SWAMPID for this issue is 33540.
This issue was rated as low.
Please submit fixed packages as soon as possible.
Also create a patchinfo file using this link:
https://swamp.suse.de/webswamp/wf/33540

Revision history for this message
In , Cdengler-z (cdengler-z) wrote :

Update process started ... be so kind and submit fixed sources and a patchinfo.

Revision history for this message
Seung Soo, Ha (sungsuha-deactivatedaccount) wrote :

@Matteo

I can confirm that the p7zip package does not completely solve the problem in the following usage scenario:

Several files, each with a korean filename(presumably with euc-kr or cp949 encoding) compressed in a windows environment to .zip.
Fileroller shows invalid filenames when opening the archive.

without the 7-zip package(p7zip-full): the filenames appear as invalid, including various illegal characters, which makes it impossible even to extract the files

with the 7-zip package: the filenames appear as invalid(but different from above), but is extractable. Although I cannot say with certainty, the invalid filenames appear as something one would expect when reading a korean webpage whilst selecting the wrong encoding scheme(ie. western-1252) in the browser

I have not evaluated a UTF-8 usage case.

Revision history for this message
Rolf Leggewie (r0lf) wrote :

guys, thank you for your testing and comments. At this point in time, that's really not necessary, though. It's known there is a problem. I think it's probably fair to say one cannot expect this will be resolved in one single upload, but will need testing after a patch has been proposed. Currently, there is no such patch to be evaluated on the table. Until that changes the comments are mostly noise, make the ticket harder to read and thus will slow down the resolution rather than helping it.

I kindly ask you to please refrain from commenting unless you want to propose a specific patch (in that case, I think it's better to just attach it to this ticket).

Revision history for this message
Yannis Tsop (ogiannhs) wrote :

using 7-zip it can be solved!

eg:
wget http://www.ops.gr/Ergorama/fileUploads/ypiresiaops/prokirikseis/biografiko.zip
LANG=el_GR.CP737 7z x -oPATH file.zip
convmv -r --notest -f utf-8 -t CP737 PATH/*

see proposed patch for package unp, same problem there:
https://bugs.launchpad.net/ubuntu/+source/unp/+bug/583417

Revision history for this message
In , Pth-3 (pth-3) wrote :

@Marcus: which is the second new library? unzip only needs librcc0.

Revision history for this message
In , Meissner-i (meissner-i) wrote :

librcc however requires librcd

Changed in unzip (Ubuntu):
assignee: nobody → Seung Soo Ha (sungsuha)
tags: added: regression-update
tags: added: needs-reassignment
Revision history for this message
Rolf Leggewie (r0lf) wrote :

Seung, what is it you intend to do for unzip? Please don't assign to yourself without indicating what it is you want to do. What is it that makes you think this is not a bug in unzip or something that needs work in unzip? I have a pretty clear plan of action and unzip will need to be patched for that.

1) get libnatspec into Debian: http://git.debian.org/?p=collab-maint/libnatspec.git (help appreciated)
2) patch unzip (and others) to use libnatspec
3) evaluate results

Revision history for this message
Seung Soo, Ha (sungsuha-deactivatedaccount) wrote :

@Rolf : Did you get my email?
In short, I asserted my intent to contact the original unzip maintainer on this issue.

Meanwhile, I have reviewed your plan and I believe that libnatspec is a viable solution.
Even so, it is my opinion that a patch should be pushed upstream(unzip).
That would be a permanent solution.

Thus, I believe there is reason to try to contact the original unzip maintainer.
If you believe that a reassignment(or un-assignment) is appropriate, please let me know.

For the time being, I think we should concentrate on efforts to "1. get libnatspec into Debian".
How is progress in that regard? What can I do to help?

Changed in gentoo:
status: Unknown → Fix Released
Changed in unzip (Mandriva):
status: Unknown → Confirmed
Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

@Yannis T:
the method that you described didn't work for me in Lucid, here's a variant that did work:

wget http://www.ops.gr/Ergorama/fileUploads/ypiresiaops/prokirikseis/biografiko.zip
LANG=C 7z x -oPATH biografiko.zip
convmv -r --notest -f cp737 -t utf-8 PATH/*

Of course this is only a desperate solution, as it temporarily creates files with wrong encoding on the file system. But a desperate solution is still better than no solution at all.

Whenever the issue is fixed, please don't just put it in Lucid proposed, but in updates, as it's a significant regression for non-English countries.

Revision history for this message
Rolf Leggewie (r0lf) wrote :

Alkis, what's so difficult to understand about "comments unwelcome"? Your workaround may be posted in good faith but I again ask everyone to refrain from such postings! Post this on your blog, but please spare this ticket.

@Seung: By assigning to yourself you discourage others (including real devs) from picking up the problem. I'll unassign you. The things you want to do can be done without you being officially assigned this ticket. Judging your skill level from your mail and comments, I'm not sure there is much you can actually do to help. Anybody with the necessary packaging or programming skills won't need to ask but can jump right in. You're willingness to help out is appreciated, but to move this ticket further you need either packaging or coding skills. If you are interested to learn about packaging, you're more than welcome. Google for "debian maintainer guide" to get you started and hang around #debian-maintainers on OFTC.

So, ONCE AGAIN everybody please refrain from commenting unless you can either code or package debs! Thanks.

Changed in unzip (Ubuntu):
assignee: Seung Soo Ha (sungsuha) → nobody
Revision history for this message
Rolf Leggewie (r0lf) wrote :

@Seung, I agree that eventually the patch should be pushed upstream if possible. There may be an acceptance issue for this kind of patch. But this is really step 3 or 4 and we're still not even half-way through with step 1 (URL is posted above)

Changed in unzip:
status: Unknown → Invalid
Revision history for this message
In , Swamp-a (swamp-a) wrote :

Update released for: librcc-devel, librcc0, librcc0-debuginfo, librcc0-debugsource, librcd-devel, librcd0, librcd0-debuginfo, librcd0-debugsource, rcc-runtime, rcc-runtime-debuginfo, unzip, unzip-debuginfo, unzip-debugsource
Products:
openSUSE 11.2 (debug, i586, x86_64)

Revision history for this message
In , Cdengler-z (cdengler-z) wrote :

Update released after a long testing phase in the test update channel.

Closing.

Revision history for this message
Vladimir Mityukov (mityukov) wrote :

It seems to me (or, maybe, I've missed it) that nobody mentioned an option to manually select the charset for filenames in the archive. This option can be added e.g., to "Archive Manager" and so on (DE-specific tools).

I understand, that auto-detection is highly desired, but it's quite problematic, as far as I can see in this ticket. On the other side, I'm pretty sure, that manual selection option _itself_ would solve the problem for 60%+ of users. Others would find solutions (involving this new charset selector) over Internet. Currently, there are no answers (except of using windows/wine for un[re]packing, or quite complex approaches, like fuse-zip) even for advanced users, which I think is too bad.

Revision history for this message
Yannis Tsop (ogiannhs) wrote :

there is no such option neither in ark nor in file-roller

NickNeo (nickneo.one)
Changed in unzip (Ubuntu):
status: Triaged → Confirmed
Revision history for this message
Peter (pry) wrote :

Importance must be changed to critical!!! Do you realize how many potential clients we lost because of this bug?!

Revision history for this message
Rolf Leggewie (r0lf) wrote :

Pryguy, if you lost clients due to this bug, how about you pledge a bounty for this to be fixed? You're obviously making money off OSS-software, good thing! Maybe you can find others to join you and increase the bounty sum.

I'm working on fixing this, but I'm doing so in my free time. What have you done so far?

Revision history for this message
Rolf Leggewie (r0lf) wrote :

let me seize the opportunity to give a status update on what I've done and know about. Others are always welcome to help, but unless you can code or package, there currently is nothing you can do. Any comment not related to packaging or coding is STRONGLY discouraged (yes, I'm looking at you pryguy, have you even bothered to read this ticket?).

Next step in my opinion is to package libnatspec. Andrew and I are making good progress, albeit slowly: http://git.debian.org/?p=collab-maint/libnatspec.git I think the package is almost ready to be released to Debian (and possibly maverick). Please refer to the git repository to see where we are. Suggestions on that package welcome.

Revision history for this message
Peter (pry) wrote :

>Maybe you can find others to join you and increase the bounty sum.
Calm down please, already did this yesterday :)

Sergey Polushin (serbly)
Changed in unzip (Ubuntu):
assignee: nobody → Sergey Polushin (serbly)
Revision history for this message
Rolf Leggewie (r0lf) wrote :

Sergey, feel free to work on a solution. But don't assign to yourself. This sends the wrong signal of "somebody is working on this and it will be fixed soon" in your case.

Changed in unzip (Ubuntu):
assignee: Sergey Polushin (serbly) → nobody
Revision history for this message
Adrien Cunin (adri2000) wrote :

Hi,

From the tests that I've done, there are two different bugs about non-ASCII characters in filenames:
 * One reproducible with files created with zip on Linux: unzip shows broken encoding filenames but correctly extracts the files
 * Another reproducible with files created on Windows, probably due to the use if ISO encoding: unzip shows broken encoding filenames and extracted files have a broken encoding as well

I've managed to fixing the first one by compiling unzip with -DNATIVE.

Revision history for this message
Rolf Leggewie (r0lf) wrote :

Andrew and I have finished the work on a natspec package. We're now pushing that package into Debian and Ubuntu.

Requesting sponsorship for

https://launchpad.net/~r0lf/+archive/ppa/+sourcepub/1262565/+listing-archive-extra
http://revu.ubuntuwire.com/p/libnatspec

@Martin Pitt, I still think you have an obligation to undo the damage you've done, whether you work on the platform team or not. I'd be very pleased to see you picking up sponsorship of this package to see if we can fix the problem for good. Of course I wouldn't be angry at any sponsor if they beat you to it this time ;-)

Revision history for this message
Benjamin Drung (bdrung) wrote :

I am going to unsubscribe ubuntu-sponsors. ubuntu-sponsors is for sponsoring debdiff and there's not yet a debdiff for unzip. Rolf, please open a new bug report for getting libnatspec into Ubuntu (this bug is already long enough) and provide all information required for a FFe [1], because we passed feature freeze and an exception is required.

[1] https://wiki.ubuntu.com/FreezeExceptionProcess

Revision history for this message
Seung Soo, Ha (sungsuha-deactivatedaccount) wrote :

The infozip team have recently spoken regarding the current state of unicode and unzip(most specifically the lack of the -O option)

They are preparing another release(6.1) and there is a thread discussing a potential fix for this matter[1]
I'm not sure if the libnatspec based patch could be a candidate for their release, but I lack the proper knowledge to contribute to the discussion there.
@Rolf: If you have not already, could you take a look at the thread?

[1] http://www.info-zip.org/board/board.pl?m-1248086794/s-45/

Revision history for this message
victorfuts (victorfuts) wrote :

        Peter Ryzhenkov wrote on 2010-07-31: #20
        Importance must be changed to critical!!! Do you realize how many potential clients we lost because of this bug?!

thats exactly what i wanna say.

Revision history for this message
Rolf Leggewie (r0lf) wrote :

Peter, so where is the bounty you pledged?

Revision history for this message
Peter (pry) wrote :

I've collected 20+ votes here. I can do a separate page for that on my web site also. Please contact me and let's discuss what is needed (my Jabber ID is in my contacts).

Revision history for this message
Murz (murznn) wrote :

Where can I add my vote to votes list?

Revision history for this message
Peter (pry) wrote :

Right on the page, click on the "This bug affects you and XXX other people".
:)

2010/9/9 Murz <email address hidden>

> Where can I add my vote to votes list?
>
> --
> unzip fails to deal correctly with filename encodings
> https://bugs.launchpad.net/bugs/580961
> You received this bug notification because you are a direct subscriber
> of a duplicate bug (177929).
>

Revision history for this message
Rolf Leggewie (r0lf) wrote :

@Murz: not in the comments, click on "this bug affects me". Other than that, there is nothing you can do unless you can code or package.

@Peter:
 you on Aug 1st (quoting me):
>Maybe you can find others to join you and increase the bounty sum.
Calm down please, already did this yesterday :)

me (today):
Peter, so where is the bounty you pledged?

You (today):
fluff

IOW, you're all just hot air. No code, no money. Just strongly demanding freebies from others.

I urge you *once more* to abstain from substanceless comments so people like me who actually work on fixing this (first package is in the queue for Debian) can focus on the work. If you can't code but want to pledge money or pay someone to do this, you're more than welcome. Complaints, "this needs to have a higher priority" comments and indications you did something when that is not true are VERY unwelcome.

Revision history for this message
Peter (pry) wrote :

@Rolf I've added your contact. Have things to discuss. There are people from Russian Fedora that fixed the bug.

Revision history for this message
Peter (pry) wrote :

> Complaints, "this needs to have a higher priority" comments and indications you did something when that is not true are VERY unwelcome.
I'm sorry I've completely misunderstood you. I thought you were saying about votes, not the coding. I understand how it feels to get messages like mine, that are 'hot air' nothing more. Sorry again...

Revision history for this message
Murz (murznn) wrote :

http://sisyphus.ru/en/srpm/Sisyphus/unzip/patches - here you can get the patches, that solve problem in alt linux unzip version. Maybe we can apply it on ubuntu unzip?
http://dside.dyndns.org/darklin/portage/app-arch/unzip/files/unzip-ds-lazyrcc.patch - here is another patch.

Changed in unzip:
importance: Unknown → Medium
status: Invalid → Unknown
Revision history for this message
Sergei Ianovich (ynvich-gmail) wrote :

@Rolf
> first package is in the queue for Debian

Could you post a link to the patch(es) or even better debdiff?

Revision history for this message
Seung Soo, Ha (sungsuha-deactivatedaccount) wrote :

@Yanovich
Is this what you're looking for?
http://git.debian.org/?p=collab-maint/libnatspec.git

Revision history for this message
In , Anixx (anixx) wrote :

Still does not work in File Roller under OpenSUSE 11.3.

Revision history for this message
In , Anixx (anixx) wrote :

Created attachment 391041
file with problem

Revision history for this message
In , Anixx (anixx) wrote :

Created attachment 391042
screenshot of file roller

Revision history for this message
In , Anixx (anixx) wrote :

The same file (bug.zip) opens well with Ark from KDE3.

Revision history for this message
In , Anixx (anixx) wrote :

Created attachment 391043
the same file opened in Ark/KDE3

Revision history for this message
In , Kyrill Detinov (lazy-kent) wrote :

Works OK.

% unzip -l bug-540598_bug.zip
Archive: bug-540598_bug.zip
  Length Date Time Name
 -------- ---- ---- ----
    72704 09-20-10 23:11 Коммерческое предложение..doc
   388608 09-20-10 23:11 прайс на палатки и снаряжение14.09.2010.xls
 -------- -------
   461312 2 files

Open a bug against File Roller. No problem with unzip.

Revision history for this message
In , Anixx (anixx) wrote :

Does File Roller use unzip in this case?

Revision history for this message
In , Kyrill Detinov (lazy-kent) wrote :

It should use unzip. But I found an interesting bugreport:
https://bugzilla.gnome.org/show_bug.cgi?id=611257

Revision history for this message
In , Anixx (anixx) wrote :

Удалил p7zip. Теперь в File Roller все нормально, но встроенный просмотрщик архивов в КДЕ3 все равно показывает мусор (в Ark все нормально).

Revision history for this message
In , Anixx (anixx) wrote :

Removed p7zip. Now all OK in File Roller, but embeeded viewer in KDE3 still shows garbage (in Ark all OK).

Revision history for this message
In , Anixx (anixx) wrote :

Created attachment 391237
what I see in embeeded viewer

Revision history for this message
In , Kyrill Detinov (lazy-kent) wrote :

Same here. Krusader 1.90.0 shows file names correctly.
As you know, nobody interested to fix KDE3 bugs.

Revision history for this message
In , Anixx (anixx) wrote :

Maybe this bug is fixed in Trinity. If not, it is possible to make a bugreport.

Revision history for this message
nandayo (casier) wrote :

Same problem here with a "é" in the directory name in the archive:

"checkdir error: cannot create PATHS_WITH_ACCENT
Invalid or incomplete multibyte or wide character"

:-/

Revision history for this message
lstefek (libor-stefek) wrote :

I would vote for increasing priority of this bug. Unzip in this version is really useless and using it is dangerous.
Consider you have large archive in zip format with accented filenames and it corrupts filenames, so it isn't possible to repair them back!
For smaler zip archives there are some workarounds (for ex. using jar from fastjar package).

Revision history for this message
Rolf Leggewie (r0lf) wrote :

On 06.10.2010 16:36, lstefek wrote:
> I would vote for increasing priority of this bug.

And I would vote for you actually reading the bug report before making
useless comments. I mean, is it really too much to ask that you at the
very minimum read the description of the problem?

For Chris' sake people, provide code or packages or shut up (sorry to be
harsh, as the sole person in Ubuntu currently actually working on fixing
this I get annoyed to no end by people without respect like lstefek).

Now, please don't make things worse by adding yet another comment. If
you have to, send me a private message if you have to get something off
your mind.

Revision history for this message
lstefek (libor-stefek) wrote :

Ok, next time I will read bug report more carefully.

Actually, I did some compiling and testing with unzip60-alt-iconv-utf8.patch on latest unzip_6.0-4 in ubuntu10.10 and unzip works reasonably well (at least -O option does it's job as it did in 5.52). This patch is also mentioned by someone from Info-ZIP team in Info-ZIP forum as probable the right one, which will be implemented in upstream.

Patch is here: https://bugs.archlinux.org/task/15256?getfile=3685
Info-Zip forum thread: http://www.info-zip.org/board/board.pl?m-1248086794/s-45/
I did usual patch -p1 <unzip60-alt-iconv-utf8.patch
and then configure and make, then some testing with accented filenames, looks good.
Sorry, for my comment before, but this bug is really very long lasting and there is too much discussion around.

Revision history for this message
Rafael Aminov (custard-py) wrote :

The bug still persists. How many decades do we need to wate for it to be fixed?

Revision history for this message
lstefek (libor-stefek) wrote :

Do-it-yourself instructions for impatient (for Ubuntu 10.04.1 LTS and Ubuntu 10.10):

$ sudo apt-get install dpkg-dev
$ apt-get source unzip
$ wget -O unzip60-alt-iconv-utf8.patch https://bugs.archlinux.org/task/15256?getfile=3685
$ cd unzip-6.0/
$ patch -p1 <../unzip60-alt-iconv-utf8.patch
$ apt-get source bzip2
$ cd bzip2-1.0.5/
$ make
$ mv * ../unzip-6.0/bzip2/
$ cd ../unzip-6.0/
$ make -f unix/Makefile generic
$ sudo make prefix=/usr install
$ unzip -h

and check that -O CHARSET is there

Revision history for this message
Murz (murznn) wrote :

lstefek, many thanks for instruction how to easy apply the patch! It solve my problem with Cyrillic letters in zip!!
Can you create the ppa with this patch included in unzip package?
Adding ppa for other users will be much easy than applying the patch!

Revision history for this message
sterios prosiniklis (steriosprosiniklis) wrote :

@lstefek
Thanks a lot!

Patch solves this issue for Greek too. LANG=el_GR.utf8
Installing patched unzip and adding in ~/.profile the following lines
export UNZIP="-O CP737"
export ZIPINFO="-O CP737"
does the trick.

This bug is a show stopper on languages that not use Latin alphabet.
The faster a fix that can easily be adopted by anyone, comes up, the better...

Revision history for this message
Skybinder (lionet) wrote :

While someone clever works on it
why not use an old version of unzip to solve the encoding problem,
just download and replace according to your architecture:
http://packages.ubuntu.com/ru/jaunty/unzip .
Use
unzip -O cp866
for Windows created zip-archives.

Revision history for this message
Rolf Leggewie (r0lf) wrote :

I'm unsubscribing from this bug. I find it hard to understand why some people have such a difficult time respecting the explicit request not to turn this ticket again into a support forum. There's the answer tracker and the forum for these kind of things. We're now again one step further away from properly fixing the underlying issue. Congratulations.

Changed in ubuntu-jp-improvement:
assignee: Rolf Leggewie (r0lf) → nobody
Revision history for this message
Stamatis Papadakis (stpapadakis) wrote : Re: [Linux.sch.gr] [Bug 580961] Re: unzip fails to deal correctly with filename encodings
Download full text (3.7 KiB)

Please unsubscribe me!!

[image: Think Green !]

------------------------------------------------------------------------------------

P Please consider the environment before printing this e-mail !

On Fri, Oct 15, 2010 at 9:02 AM, Skybinder <email address hidden> wrote:

> While someone clever works on it
> why not use an old version of unzip to solve the encoding problem,
> just download and replace according to your architecture:
> http://packages.ubuntu.com/ru/jaunty/unzip .
> Use
> unzip -O cp866
> for Windows created zip-archives.
>
> --
> unzip fails to deal correctly with filename encodings
> https://bugs.launchpad.net/bugs/580961
> You received this bug notification because you are a member of Ubuntu
> Greece, which is a direct subscriber.
>
> Status in Ubuntu Japanese Kaizen Project: Fix Committed
> Status in unzip - free software .zip unarchiver: Unknown
> Status in “unzip” package in Ubuntu: Confirmed
> Status in “unzip” package in Debian: Confirmed
> Status in Gentoo Linux: Fix Released
> Status in “unzip” package in Mandriva: Confirmed
>
> Bug description:
> Binary package hint: unzip
>
> This is a fairly annoying bug that's been around and known at least since
> 2005. It's very visible as it will very often make exchange of zip files
> with Windows users impossible, for example. As such, it gathered it's fair
> share of "me too" and "how dare you haven't fixed this yet!!111!" comments.
>
> Problem description:
> zip/unzip and the specification fall short when dealing with non-ASCII
> filenames not encoded in UTF-8
>
> test case:
> do an "unzip -l" on the file http://tinyurl.com/2aofpxs and witness the
> question marks
>
> affected programs:
> the problem is in unzip itself, but affects GUI like xarchiver,
> file-roller, etc. that rely on unzip for the decompression
>
> suggested solutions (most are workarounds, not proper fixes):
> a) reintroduce patch for codepage-based zip filenames: bug 477755,
> http://tinyurl.com/2aqdbqg (Ubuntu blueprint)
> b) unzip filename according to locale: bug 203609
> c) Ubuntu JP has a patch, probably not generally applicable, bug 269482
> d) Russian altlinux distro uses natspec lib and patched zip binary
>
> natspec was mentioned in bug 477755 comment #2 and may indeed be a proper
> fix, needs closer inspection (I haven't really looked, yet. As discussed in
> https://bugzilla.gnome.org/show_bug.cgi?id=306403 there is no failsafe,
> straight-forward way to fix this in all cases. Nonetheless, the current
> situation can and should be improved. There's some good ideas floating
> around. It needs somebody to pull and wrap them together.
>
> It's unfortunate the FOSS community so far hasn't been able to fix this
> rather visible problem. I'm opening this ticket as a master bug and clean
> slate to document the issue and current status. Please don't ruin it by
> making above-mentioned unhelpful comments, they actually slow things down!
> Please don't nominate for a release.
>
> Unless you're a dev and can provide a patch, you should think VERY
> carefully to do anything but
>
> 1) subscribe yourself to this ticket
> 2) mark this bug as affecting you
> 3) tell me via mail about other bugs you...

Read more...

Revision history for this message
AsstZD (eskaer-spamsink) wrote :

Now when that fucking drama queen quit with his lobbying of shitty library, can someone finally commit the patch from AltLinux and be done with it?

Revision history for this message
cbrmichi (cbrmichi) wrote :

As some of your comments aren´t very helpful please stop moaning, stop defamation and simply stop writing unhelpful things.
you accomplished to banish the only one who had been working on this issue.
Can YOU do this? No? So why are you doing this?

There is no company working on this problem, one of us has to do this. If you want someone to help you, you have to be friendly or use commercial software.

Changed in gentoo:
status: Fix Released → Won't Fix
Revision history for this message
Calmarius (david15b) wrote :

You can unzip anything if you use the command line and unzip the entire archive, just type 'unzip <anyfile>.zip'.

Click (clicky-mail)
Changed in unzip (Ubuntu):
assignee: nobody → Click (clicky-mail)
dr-ser (dr-ser)
summary: - unzip fails to deal correctly with filename encodings
+ распаковать не учитывает правильно с именами файлов
Aron Xu (happyaron)
summary: - распаковать не учитывает правильно с именами файлов
+ unzip fails to deal correctly with filename encodings
Alex Smirnov (coder1993)
Changed in unzip (Ubuntu):
assignee: Click (clicky-mail) → Alex Smirnov (coder1993)
Revision history for this message
Sergey Sedov (serg-sedov) wrote :

"This bug affects you and 409 other people" "Heat 2002"
Dear developers, please tell me, what can ordinary people do to help you? For example, test the patch?

Revision history for this message
ap0stol (x777ozon) wrote :

Assigned to Alex Smirnov ?

it's Joke ? or Alex - developer?

Revision history for this message
Sergey Sedov (serg-sedov) wrote :

"This bug affects you and 500 other people" 0_o

For developers, maybe it helps, here is a patch for opensuse:

https://bugzilla.novell.com/show_bug.cgi?id=540598

Revision history for this message
Sergey Sedov (serg-sedov) wrote :

Dear developers, here is a patched version of unzip from http://mintlinux.ru/

http://mintlinux.ru/index.php?topic=409.msg5642#msg5642

After installing the problem was solved

tags: added: patch
Changed in ubuntu-jp-improvement:
status: Fix Committed → Invalid
Revision history for this message
Rolf Leggewie (r0lf) wrote :

Bartosz, please refrain from setting status for the Ubuntu Japanese Kaizen if you are actually not involved with the project. You obviously don't understand it. Since you have bug-control privs, you really should know better.

Changed in ubuntu-jp-improvement:
importance: Undecided → High
status: Invalid → Fix Committed
Changed in unzip (Ubuntu Natty):
assignee: Alex Smirnov (coder1993) → Pinkertonik (pinkertonik)
Revision history for this message
Compinfer (nvkinf) wrote :

very bad bug!!

Revision history for this message
J.R. (jr-weimprovise) wrote :

Thank you for the patched version info, Sergey.

Changed in file-roller:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
Sergey Nagaytsev (sergey-nagaytsev) wrote :

The bug is FIXED by Russian AltLinux community and some humble developer who improved on their patch to apply to then-newest (2010-11) InfoZIP sources, but published only in country-local website in Russian language.

Here's the Google-translated publication with all links and instructions: http://goo.gl/paR5i

I tested the patches, they work for me on Ubuntu 10.11 amd64. I wrote to topic starter, waiting for reply, if none - will try to contact ZIP maintainer in Debian.

Shame to the Russian-speaking persons who were too lazy to search the Web, instead barking at actual developers - it took me a few hours to find the ready solution and test multiple ways of applying it, 'alien' (failed, worked only for reading) and source build.

Revision history for this message
Brian Thomason (brian-thomason) wrote :

I'm uploading a build with the archlinux patch for natty for those that might find it helpful.

It looks like this was fixed upstream with a modified version of this patch as of 6.10b, so the best effort to get this fixed now would be to:

A.) Ensure 6.10b works as expected
B.) Ensure 6.10b is in natty by feature freeze

Then the process of doing an SRU for Maverick and Lucid could begin.

Revision history for this message
Aron Xu (happyaron) wrote :

Hello, the fix was already there for sometime, the big reason is the Debian package maintainer does not like to have unzip with a patch that upstream does not plan to include.

As in #62 by Brain, if upstream accepted it, we can ask for a FFe for natty, and SRU for older releases.

Revision history for this message
Dmitry Agafonov (dmitry-agafonov) wrote :

Aron: Can Ubuntu/Debian switch to different default zip handler to bypass this situation, e.g. use AltLinux's zip and unzip as upstream?

Revision history for this message
Aron Xu (happyaron) wrote :

You have to ask Debian's unzip package maintainer to check whether he likes to do that.

Revision history for this message
Sergey Nagaytsev (sergey-nagaytsev) wrote :

Here is, indeed, the ready working solution right at this site:
1) Add two PPA's, https://launchpad.net/~frol/+archive/zip-i18n/ (the patches I mentioned in #61, with the same publication reference) and https://launchpad.net/~r0lf/+archive/ppa (libnatspec, it's dependency)
2) apt-get update; apt-get install zip unzip

SOLVED ! I just tested this - works two-way, read and write.

So after fixing the bug technically, the only step remaining is to fix it socially: come to agreement about inclusion of libnatspec and patched InfoZIP into Ubuntu/Debian, or competent voted decision to implement codepage conversion inside InfoZIP in accordance with it's developers, so patches can be ultimately upstreamed.

P.S. Sorry for overlooking the ready solution in the first place, I discovered it doing last-minute duplicate check before creating my own PPA.

Revision history for this message
Aron Xu (happyaron) wrote :

Please also send your proposals to Debian bug tracking system, at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=483290

There we can have the patch included, and at the same time Ubuntu developers won't be annoyed by merging changes for every new versions.

Revision history for this message
Brian Thomason (brian-thomason) wrote :

https://edge.launchpad.net/~brian-thomason/+archive/ppa

This patched solution doesn't require natspec or librcc to run. This is the solution upstream has used (with modifications) in 6.10b to fix the problem.

Changed in hundredpapercuts:
status: New → Confirmed
Revision history for this message
Sergey Nagaytsev (sergey-nagaytsev) wrote :

@Brian: unzip is only a half of solution (not tested personally), zip will remain broken, writing archives not readable on other systems.

@Aron: Done, but I have heard what Debian maintainer for ZIP doesn't want to accept solution what won't be ultimately upstreamed, and InfoZIP team does not accept Linux-specific patch since their purpose is ultimate portability.

Revision history for this message
Murz (murznn) wrote :

Sergey Nagaytsev, thank's for ppa and patched versions! I test zip and unzip in ubuntu 10.10 amd64 with windows zip archive with encoding cp1251 and cp866 (cyrillic), and all works normally!

Revision history for this message
Shimi Chen (shimi-chen) wrote :

I updated unzip through the PPA's posted by Sergey Nagaytsev in #66 and the fix is only partial.

Some files that weren't displayed correctly are now displayed correctly but there is at least one zip file still showing strage file names such as:
"€—Œ‰ €—
Œ
‚‰„
Ž†
 - ‰
‡€‰ ‹˜ŽŒ" instead of the original Hebrew title.

The same exact zip file is displayed correctly in windows xp in virtualbox, using both the built-in zip viewer in windows and 7-zip.

Revision history for this message
Ignat Loskutov (iloskutov) wrote :

@Brian Thomason
Thank you for the PPA but why it doesn't include the unzip package for Lucid?

Changed in unzip (Ubuntu Natty):
assignee: Pinkertonik (pinkertonik) → Canonical Desktop Team (canonical-desktop-team)
Revision history for this message
Sergey Nagaytsev (sergey-nagaytsev) wrote :

@Shimi Chen, thank you for testing, but this way of reporting is counter-constructive, since it's not reproducible.

Please supply minimal live examples of:
* good and bad Hebrew archives
* descriptions of how did you construct them. Scripts for automated testing are especially welcome.
* files attempted to compress, under surely good archiver (I see 7z is good)
* files extracted after bad compression, also under surely good archiver

Here is the test data I prepared for many languages along with test automation script, if something is incomplete for Hebrew (additional letter elements like Latin diacritics ?), please add.

With reproducible tests at hand, I will try to contact the patch authors, or if this fail, we should give this problem to some highly international team (at a university ?).

Revision history for this message
Shimi Chen (shimi-chen) wrote :

I apologize for begin counter-constructive it's that I lack the technical knowledge to make such scripts.

I attached a .zip file I created in windows using 7zip 9.20.
Here is the screenshot of the archive creation dialog(did not change any setting):
http://img543.imageshack.us/img543/8361/7zipi.png

And here is a screenshot difference between file-roller and windows zip viewer:
http://img406.imageshack.us/img406/4492/comparisonu.png

And lastly a screenshot of the extracted files in nautilus:
http://img607.imageshack.us/img607/7253/extracted.png

I did try your uploaded .7z file and I can see the file name fine in Ubuntu.

I also tried compressing the attached archive in .7z instead of .zip and it works. The problem seems to be with .zip archives.

Revision history for this message
Dmitry Frolov (frol) wrote :

Shimi Chen: I'm not sure, but 7zip may save file names in archives (including zip) in UTF-16 charset. unzip from ppa:frol/zip-i18n translates file names from OEM-charset which is applied by default in Windows for your language (probably not UTF-16, language is determined by your current locale), to charset of your current locale.

You may need to process such files with p7zip or p7zip-full in ubuntu.

Revision history for this message
Ignat Loskutov (iloskutov) wrote :

I don't know why but the 5.52 version works perfectly by default! May be the latest version should use the same way to detect the charset as 5.52 uses?

Revision history for this message
Piggy Blotch (wise-1) wrote :
Revision history for this message
Sergey Nagaytsev (sergey-nagaytsev) wrote :

I contacted the author of last patches (compiled into PPA by @frol) about reported flaw with Hebrew and possibly other languages than Russian.

He said he just solved problem for himself, published the solution on his favorite local Linux/FOSS site "so it won't be lost", and doesn't want to work on it any further.

Also he said he wrote an email to the original author of 'libnatspec' and zip/unzip patches using it, with an offer to include his contribution. The email was never answered.

So, this path of solution looks like a blind alley - at least if we want flexible worldwide solution.
--------
From my perspective, the problem is in clinging to the old piece of legacy code what tries to be extremely portable at the cost of architectural layering, use of libraries and system-specific facilities.

I see the solution in abandoning InfoZIP altogether and writing a gcc/*nix only zip/unzip in either of two ways:
1) Well designed to the modern coding standards library for zipfile format (say 'libzipfile'), with all reasonable hooks and callbacks for stream and piece-wise processing, NOT doing any data transformations ex for ZIP record structures and calling zlib for (de-)compression itself. And atop of this library, also modern well-designed command line tools, relying on Linux/*NIX specific libraries for things like charset conversion, command line options etc, written for clarity and flexibility.
2) Stopgap scripts in Python, wrappers around it's zipfile and iconv modules, minimal to the requirements of GUI's like FileRoller and it's KDE counterpart.

Revision history for this message
amdlin (amdlintuxos) wrote :

Looking forward to have this bug fixed.
http://oi52.tinypic.com/25i2737.jpg
Currently i try to avoid using cyrillic letters, but this is not way which i am going to follow forever

Revision history for this message
Martin Pitt (pitti) wrote :

I'll upload the patched package prepared by Brian. That adds the -I / -O options to specify character sets, but of course that won't help automatically determining the charset when e. g. file-roller calls unzip?

Changed in unzip (Ubuntu Natty):
assignee: Canonical Desktop Team (canonical-desktop-team) → Brian Thomason (brian-thomason)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package unzip - 6.0-4ubuntu1

---------------
unzip (6.0-4ubuntu1) natty; urgency=low

  * Added patch from archlinux which adds the -O option allowing a charset
    to be specified for the proper unzipping of non-latin and non-unicode
    filenames. (LP: #580961)
 -- Brian Thomason <email address hidden> Wed, 12 Jan 2011 20:08:14 -0500

Changed in unzip (Ubuntu Natty):
status: Confirmed → Fix Released
Revision history for this message
Shimi Chen (shimi-chen) wrote :

I installed unzip 6.0-4ubuntu1 and it does not fix the bug for me using the .zip file uploaded in #74.

I installed from:
https://launchpad.net/ubuntu/+source/unzip/6.0-4ubuntu1/+buildjob/2228755/+files/unzip_6.0-4ubuntu1_amd64.deb

Revision history for this message
Martin Pitt (pitti) wrote :

Shimi, did you specify -I or -O, or did you just try with file-roller? As pointed out above, above patch only adds those options, but doesn't seem to set any encoding by itself.

Revision history for this message
Shimi Chen (shimi-chen) wrote :

I can't find any documentation for the -O/I option. I tried some charset codes that made sense which I found online, results attached.

Revision history for this message
Paul Sladen (sladen) wrote :

Perhaps this patch could be improved to sanity check whether the passed encoding does actually *decode* into something sane before attempting to do so. Shimi-Chen: Looking at the comments further up, you possibly want to try a string like "-O cp737" (for Greek). If you can attach an example .zip, somebody can probably help to work out what encoding it is and thus what command you probably need to us.

Revision history for this message
Shimi Chen (shimi-chen) wrote :

Okay, using CP862 worked for the archive in #74. I guess this is a bug with file-roller now?

Revision history for this message
Sergey Nagaytsev (sergey-nagaytsev) wrote :

@Launchpad Janitor in #81, the proposed fix really BREAKS the fix by @frol 's PPA.

$wget http://ru.archive.ubuntu.com/ubuntu/pool/main/u/unzip/unzip_6.0-4ubuntu1_amd64.deb
$sudo dpkg -i unzip_6.0-4ubuntu1_amd64.deb
$unzip -l russian.zip
008 - Russian/?????????? ?????? ???????? ???????????? ?????????????????????? ??????????, ???? ?????????? ???? ??????.txt

Ooops, it's broken !

$wget http://ppa.launchpad.net/frol/zip-i18n/ubuntu/pool/main/u/unzip/unzip_6.0-4ppa3_amd64.deb
$sudo dpkg -i unzip_6.0-4ppa3_amd64.deb
$unzip -l russian.zip
008 - Russian/Съешь ещё этих мягких французских булок, да выпей же чаю.txt

Fixed, we got it back !

The archive is attached, made by GUI 7z from Windows via Wine.

P.S. 'libnatspec' was installed previously and never removed.
P.S.2 In File Roller, it's broken both ways

Revision history for this message
Shimi Chen (shimi-chen) wrote :

At least for me the difference between the packages is only in terminal output.
With Brian's package I get the question marks like Sergey and with Sergey's package I get the actual filenames in the terminal(albeit mirrored, but that's a bug with bash).
However, in *both* cases when I navigate with nautilus to the folder I extracted to the filenames are listed correctly in my language(hebrew).
(note that I use -O CP862)

Changed in gentoo:
importance: Unknown → Medium
Changed in unzip (Mandriva):
importance: Unknown → High
Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

I confirm that 6.0-4ubuntu1 is still half-broken, compared to previous Ubuntu versions.
 * Extracting works: UNZIP="-o cp737" unzip archive.zip
 * Listing doesn't: UNZIP="-o cp737" unzip -l archive.zip
Since listing doesn't work, file-roller also doesn't work.
I tested with http://www.ops.gr/Ergorama/fileUploads/ypiresiaops/prokirikseis/biografiko.zip

Btw, here's an easy way to provide the unzip environment variables for all users.

Create the following script:
$ sudo gedit /usr/local/bin/unzip

Paste these 4 lines, but replace cp737 with the appropriate charset for your locale:
#!/bin/sh

export UNZIP="-O cp737"
export ZIPINFO="-O cp737"
exec /usr/bin/unzip "$@"

Save, exit, and make the script executable:
$ sudo chmod +x /usr/local/bin/unzip

Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

I just tried frol's PPA with the libnatspec dependency.
It didn't correctly autodetect the encoding so I didn't see any benefit with that implementation.
It did however work as good as unzip did in older Ubuntu versions, i.e. both extracting *and listing* worked if either the UNZIP/ZIPINFO environment variables were defined, or if the -O parameter was used:

# autodetection doesn't work:
$ echo $LANG
el_GR.utf8
$ unzip -l biografiko.zip
    43008 2010-05-14 12:20 biografiko/Ä¢₧Ü圪 ⌐¼úºóπ¿α⌐₧¬ ÿσ½₧⌐₧¬_Ö.doc

# works with the UNZIP environment variable:
$ UNZIP='-O cp737' unzip -l biografiko.zip
    43008 2010-05-14 12:20 biografiko/Οδηγίες συμπλήρωσης αίτησης_β.doc
# (alternatively, the same output is also produced with "-O cp737" in the command line)

# and in zipinfo mode, as called by file-roller:
$ ZIPINFO='-O cp737' unzip -ZTs biografiko.zip
-rw-a-- 2.0 fat 43008 b- defN 20100514.122016 biografiko/Οδηγίες συμπλήρωσης αίτησης_β.doc
# (alternatively, the same output is also produced with "-O cp737" in the command line)

So, it looks like only half of the required patch was included in unzip 6.0-4ubuntu1.
Please make listing (unzip -l and unzip -ZTs) work, as it did in previous Ubuntu versions.

Revision history for this message
Mackenzie Morgan (maco.m) wrote :

Setting back to triaged since the patch seems to be incomplete

Changed in unzip (Ubuntu Natty):
status: Fix Released → Triaged
Revision history for this message
Martin Pitt (pitti) wrote :

Removing the natty task, this is not a new regression and thus not a release blocker.

Changed in unzip (Ubuntu Natty):
status: Triaged → Won't Fix
Revision history for this message
Mechanical snail (replicator-snail) wrote :

Is Bug #177929 a duplicate of this one?

Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

If I'm not mistaken, a fix for this bug has been committed upstream since 2010/12/05:
ftp://ftp.info-zip.org/pub/infozip/beta/unzip610b.zip

A related post in their forums:
http://www.info-zip.org/phpBB3/viewtopic.php?f=7&t=223&start=45#p2113

Maybe it'd be simpler to use the 6.1 package rather than applying the altlinux patch?

Revision history for this message
Martin Pitt (pitti) wrote : Re: [Bug 580961] Re: unzip fails to deal correctly with filename encodings

Alkis Georgopoulos [2011-03-10 12:27 -0000]:
> If I'm not mistaken, a fix for this bug has been committed upstream since 2010/12/05:
> ftp://ftp.info-zip.org/pub/infozip/beta/unzip610b.zip

If that makes it any better, sure.

> Maybe it'd be simpler to use the 6.1 package rather than applying the
> altlinux patch?

But AFAIK the new -I and -O options already work. I thought the
remainder of the problem was that the value isn't autodetected, but
needs to be specified manually?

Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

Στις 10-03-2011, ημέρα Πεμ, και ώρα 15:04 +0000, ο/η Martin Pitt έγραψε:
> But AFAIK the new -I and -O options already work. I thought the
> remainder of the problem was that the value isn't autodetected, but
> needs to be specified manually?

No, the -I and -O options do not work properly.
It looks like something, maybe just a couple of lines, were missing from the Natty patch.

I described the current situation in comment #89:
 * Extracting works: UNZIP="-o cp737" unzip archive.zip
 * Listing doesn't: UNZIP="-o cp737" unzip -l archive.zip
Since listing doesn't work, file-roller also doesn't work (it uses `unzip -ZTs`, i.e. listing in zipinfo mode).

And no, I'm not talking about autodetection, libnatspec wasn't used in the upstream patch and it fails to autodetect Greek anyway (haven't tested other locales).
So personally for now I'd be happy if unzip was at least working with the -I and -O options and with the UNZIP environment variable.

Revision history for this message
Martin Pitt (pitti) wrote :

Alkis Georgopoulos [2011-03-10 17:20 -0000]:
> No, the -I and -O options do not work properly.

Ah, ok. In this case, updating to the full new upstream version seems
fine.

Revision history for this message
Dmitry Frolov (frol) wrote :

Alkis Georgopoulos wrote on 2011-02-15:
> I just tried frol's PPA with the libnatspec dependency.
> It didn't correctly autodetect the encoding so I didn't see any benefit with that implementation.

Alkis, can You please confirm that libnatspec is working correctly on your machine, by installing natspec-bin frol ppa:r0lf/ppa and running "natspec -i"? Does it correclty detects charsets for your locale?

Also can You post output of "zipinfo -v biografiko.zip"?

Revision history for this message
Shimi Chen (shimi-chen) wrote :

For me the latest libnatspec does not recognize the charset correctly.
Attached are both outputs.

Changed in hundredpapercuts:
status: Confirmed → Invalid
Revision history for this message
Alex. K. (alex-kovtonuk) wrote :

yeah, you right - its not a paper cut.
its really a knife stab

btw just compile version from here > ftp://ftp.info-zip.org/pub/infozip/beta/unzip610b.zip

it work quite well, except it still doesnt work with er.. lets say, winrar-maked zip archives - interoperability with other os not achieved

oh, and it broke file-roller too. :(

For now frol's ppa is best solution (for me, at least)
(thanks flol !)

Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

frol wrote on 2011-04-01:
> Alkis, can You please confirm that libnatspec is working correctly
> on your machine, by installing natspec-bin frol ppa:r0lf/ppa and
> running "natspec -i"?
> Does it correclty detects charsets for your locale?
>
> Also can You post output of "zipinfo -v biografiko.zip"?

It correctly detects the WIN encoding, which is CP1253, but it doesn't correctly detect the DOS encoding.
It claims it to be 437, while the correct one is 737:
http://en.wikipedia.org/wiki/Code_page_737

Attaching the output of those commands.

Also attaching the correct output, which is produced if one manually specifies the encoding:
$ ZIPINFO='-O CP737' zipinfo -v biografiko.zip

Revision history for this message
Dmitry Frolov (frol) wrote :

Alkis, You need to cantact libnatspec maintainer about this bug: https://launchpad.net/~r0lf/+contactuser

Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

frol wrote 51 minutes ago:
> Alkis, You need to cantact libnatspec maintainer about this bug:
> https://launchpad.net/~r0lf/+contactuser

That person closed my bug report (LP #477755) where I was suggesting the altlinux solution that was finally accepted upstream, in order to open this bug. Then he told me that comments in this bug report were unwelcome (comment #16), so I became unable to provide feedback to the bug I reported.
Then he started asking people money about his packaging job in libnatspec (developed by Vitaly Lipatov) and being rude when people weren't willing to pay.
Fortunately at a point he got "insulted" because people continued to comment on this bug report, and left.

What I mean is that I don't want to contact that package maintainer, and if possible, I wouldn't want to use his package either, more so when the upstream solution doesn't use it.
I've been coding open source programs for almost 20 years now, and been packaging some of them in the last few years that I've switched to Linux. I'd prefer to redo any needed coding or packaging work myself before having to cooperate with such a person.

For now I'll just wait for Debian to package the newer version of unzip, and then contact upstream if any problems persist.

Revision history for this message
Rolf Leggewie (r0lf) wrote :

Alkis, now you are really going too far with your character assassination and outright lying. You've signed the CoC, you should apologize.

I never asked for money to get this situation sorted, at least not for myself. I'm just fed up with people who make all kinds of grandish claims and demands but do zilch to get to a REAL solution. People who "NEED A FIX RIGHT NOW!1!" and can't code or package simply ought to pay someone to fix the issue or band together and pledge a bounty. People also still haven't understood that even if libnatspec were to work as intended and even if it were a step in the right direction getting it included wouldn't even be half-way to a proper fix. There'd still be many things left to be done. Insinuating that I was asking for money for myself for packaging libnatspec is so laughable it's not even funny ;-) Read through the ticket and you will see my many (ultimately failed) attempts to keep the ticket focused so a REAL developer will have a look at it.

Alkis, I've marked your ticket as a dupe to this one in the hope to focus discussion. That's a bit different than what you are hinting at with "he closed". You even thanked me for taking it up a few hours after I did that. You are so sneaky, it makes me sick.

I'm very happy I left this ticket. Unfortunately, I cannot completely unsubscribe myself from it, so I still have to bear with the many idiotic comments it gets. I've lost all interest in this issue, I have no intention of doing anything further to help get a proper fix into Debian or Ubuntu and thus don't really consider myself the libnatspec maintainer. People are of course free to grab my package but I won't be making any updates to it.

Revision history for this message
Paul Sladen (sladen) wrote :

Please could I remind me of the:

  Ubuntu Code of Conduct
  http://www.ubuntu.com/community/conduct

For many people Free software and Ubuntu are a hobby. If somebody else's hobby happens to make (your) life easier, then great; but it's still often their hobby …with the time being contributed freely from their own heart and for their own gratification.

1l1a (1l1a)
description: updated
Revision history for this message
Dmitry Frolov (frol) wrote :

OK, I copied libnatspec to ppa:frol/zip-i18n and restored support for CP737 and CP775.

Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

@frol: ppa:frol/zip-i18n contains libnatspec 0.2.6-0ubuntu2,
while ppa:r0lf/ppa contains a "later" version, libnatspec 0.2.6-1.
$ dpkg --compare-versions 0.2.6-0ubuntu2 gt 0.2.6-1 || echo "Needs version bump"
Needs version bump

For people that have/had r0lf's ppa in their sources to be able to upgrade, you'd need to bump the version.
I've tested with "sudo apt-get install libnatspec0=0.2.6-0ubuntu2" and indeed now it correctly autodetects, lists and uncompresses cp737 (Greek) zip files. Thanks!

Justin Krehel (jkrehel)
Changed in linuxmint:
status: New → Triaged
Revision history for this message
sergzxc (boxdom) wrote :

zip/unzip and the specification fall short when dealing with non-ASCII filenames not encoded in UTF-8

Revision history for this message
Ru_Grey (linux-cool) wrote :

there is a bug

Vova (vosha)
Changed in unzip (Ubuntu Natty):
assignee: Brian Thomason (brian-thomason) → Vova (vosha)
f-firefox (f-firefox)
Changed in unzip (Ubuntu):
status: Triaged → Fix Released
Changed in unzip (Ubuntu):
status: Fix Released → Triaged
f-firefox (f-firefox)
Changed in unzip (Ubuntu):
status: Triaged → New
Revision history for this message
C de-Avillez (hggdh2) wrote :

@f-firefox: please stop changing the status.

Changed in unzip (Ubuntu):
status: New → Triaged
Changed in unzip (openSUSE):
importance: Unknown → Medium
status: Unknown → Fix Released
Revision history for this message
Kostiantyn Rybnikov (k-bx) wrote :

Glad to know that! Have a nice vacation!

tags: added: rls-mgr-o-tracking
tags: removed: rls-mgr-o-tracking
Mad_Loki (madloki1)
Changed in unzip (Ubuntu):
assignee: Brian Thomason (brian-thomason) → Mad_Loki (madloki1)
Changed in unzip (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

"assignee: madlokil" means that madlokil is the developer responsible for fixing the bug.
"status: In progress" by thefusionite means that someone is working to fix the bug, so the other developers don't need to bother with it.
So by setting the status to assigned+in progress, you actually discourage developers from working on this bug.
Allow me to assume that it was done by mistake, and to reset the status to "unassigned" and "triaged".

WRT comments #96 and 97, since the newer unzip version is not yet packaged for Debian, and doesn't look like it will be in time for Precise, could we do something so that we don't have a broken unzip in the next LTS version too?
 * Either package the newer upstream version,
 * Or check what's missing from the existing patch that prevents -I and -O from working?

Changed in unzip (Ubuntu):
assignee: Mad_Loki (madloki1) → nobody
status: In Progress → Triaged
wert (wert-dmitrii)
Changed in unzip (Ubuntu):
assignee: nobody → wert (wert-dmitrii)
Changed in unzip (Ubuntu):
assignee: wert (wert-dmitrii) → nobody
Revision history for this message
Ma Hsiao-chun (mahsiaochun) wrote :

Hi, all.

I wonder we'd have a test cases database. I think we can use the bug tracking systems come with code hosting services . People use different natural languages and have different compression preferences can all post sample archives. Maintainers mark new and essential test cases as confirmed, repeated test cases as duplicate, ...

This database, more specifically the confirmed part, can then be used as the verification tool of proposed solutions.

This bug points to several unzip packages in different distributions. It seems that info-zip people refused local encoding related patch and info-zip's board is down. We'd maintain the patch ourselves, which is not ideal. (I'm not very sure whether info-zip's license permits forking)

Is it possible to use p7zip, bsdtar or other promising projects, if any, as our foundation? if we have an CLI archiver with official supported encoding conversion capacity. The remaining work for file-roller and ark would be much easier. Changing main archiver may not be that dangerous, I believe, if we have enough test cases.

Revision history for this message
Laurent Dinclaux (dreadlox) wrote :

I can't unzip a file created in windows without having weird filenames! This bug seems 7 years old ....

Revision history for this message
ryou ezoe (boostcpp) wrote :

I found a bug in this iconv patch: 04-unzip60-alt-iconv-utf8.
The problem is, this patch allocate buffer( which is for storing converted string ) twice the size of source string plus one byte.
As seen in line 81-84 of the patch.

+ slen = strlen(string);
+ s = string;
+ dlen = buflen = 2*slen;
+ d = buf = malloc(buflen + 1);

This cause conversion fails for some cases.
Because, in some character encodings, it requires more than twice the storage to represent a given character in other encodings(especially UTF-8, Ubuntu's default encoding).

For example, There are characters HALFWIDTH KATAKANA LETTER.
In SHIFT_JIS and CP932 encoding, halfwidth katakana letters are represented in one octet.
But, in UTF-8, it requires three octets.

For example,
'ア' ( U+FF71: HALFWIDTH KATAKANA LETTER A)
is encoded to 0xB1 in Shift_JIS and CP932.
This is one octet.
But in UTF-8, it is encoded to 0xEF, 0xBD, 0xB1.
This is three octets.

So, because current unzip just allocate twice the size of source string for buffer, it fails to handle zip file containing a file name consisting all or a lot of half width katakana letter.

I suggest to change the size of buffer, four times the size of source string plus one byte.
Because, Ubuntu's default encoding is UTF-8 and the largest valid UTF-8 sequence of one character is 4 octet.

replace the line 83 of 04-unzip60-alt-iconv-utf8 to the following:
+ dlen = buflen = 4*slen;

Revision history for this message
Eremin M.A. (eremikhail) wrote :

The bug is connected with cyrilic encoding in archives

Revision history for this message
Lasse Kärkkäinen (tronic+mb48) wrote :

For whatever the reason, unzip-6.0-4ubuntu1 still doesn't support choosing the codepage and it uses CP866 (cyrillic) by default. This is a very bad choice. The default should be CP437 as is the default used by IBM BIOS, MS-DOS, VGA, etc. It should *also* be configurable via commandline option.

Meanwhile, incorrectly extracted filenames can be fixed by
convmv -f UTF-8 -t CP866 -r --notest . # Undo the incorrect conversion done by unzip
convmv -f CP437 -t UTF-8 -r --notest . # Convert CP437 (MS-DOS charset) into UTF-8 (what Linux systems should use nowadays)

If your zip actually uses something else, replace CP437 with the applicable DOS codepage (and remove --notest to do dry runs without actually renaming anything while testing different options).

Revision history for this message
Dmitry Frolov (frol) wrote :

@Lasee: It's fault of the debian/patches/04-unzip60-alt-iconv-utf8 patch that contains the following:

[...]
+/* A mapping of local <-> archive charsets used by default to convert filenames
+ * of DOS/Windows Zip archives. Currently very basic. */
+static CHARSET_MAP dos_charset_map[] = {
+ { "ANSI_X3.4-1968", "CP850" },
+ { "ISO-8859-1", "CP850" },
+ { "CP1252", "CP850" },
+ { "UTF-8", "CP866" },
+ { "KOI8-R", "CP866" },
+ { "KOI8-U", "CP866" },
+ { "ISO-8859-5", "CP866" }
+};
[...]

Althouth it's simple to add missing mappings to that table, the patch is broken because it doesn't take language into account, which may cause problems if user's charset is UTF-8 and the language is not russian.

Revision history for this message
Fagoth (fagoth) wrote :

affects me too

Changed in unzip (Mandriva):
status: Confirmed → Unknown
Revision history for this message
CSRedRat (csredrat) wrote :

Fixed?

Revision history for this message
Leo (llenchikk) wrote :

Archive sample attached here still opening with this bug in Ubuntu 12.04.
File Roller 3.4.1

Revision history for this message
Vladimir Skvortsov (vskvortsoff) wrote :

Ubuntu 12.10 (UI with US English-UTF-8 codepage)

It seems if you KNOW from which SW platform zip file comes from and codepage, you can successfully unzip the archive without loosing non-ASCII filenames not encoded in UTF-8.

I just did one experiment to unpack zip file that has been created in Korean Windows 7 and contains the Korean characters in both zip archive name and compressed files.

First let's get a local-specific info:

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Let's check the version of unzip utility:

$ unzip --help
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.
...
Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]
Default action is to extract files in list, except those in xlist, to exdir;
file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for usage).
...
-O CHARSET specify a character encoding for DOS, Windows and OS/2 archives
-I CHARSET specify a character encoding for UNIX and other archives

Look at options with the following modifier:

-O CHARSET specify a character encoding for DOS, Windows and OS/2 archives

It is not -"zero", it is -O (capital O letter)!

In my case Korean Windows has EUC-KR codepage. The compressed zip-file has "2013년 설날" file name.

It means my command line will look like:

$ unzip -O EUC-KR "2013년 설날"

After checking unpacked files, it works! All files have right Korean encoding without strange characters.

Steve Langasek (vorlon)
tags: removed: regression-update
no longer affects: hundredpapercuts
Revision history for this message
Fagoth (fagoth) wrote :

here is a screenshot with archive from the top

Пётр (plmak)
Changed in unzip (Ubuntu):
assignee: nobody → Пётр (plmak)
Пётр (plmak)
Changed in unzip (Ubuntu):
assignee: Пётр (plmak) → nobody
assignee: nobody → Пётр (plmak)
Revision history for this message
Vitaly (md-xytop) wrote :

How I fixed it:

(/home/user) $: apt-cache source unzip
(/home/user) $: cd unzip-6.0
(/home/user) $: gedit extract.c

go to line 2599..

there will be:

if (!isprint(*r)) {

You have to replace it to:

if(!iswprint(*r)) {

(we replaced function isprint by iswprint).

And now do:

(/home/user) $: cp unix/Makefile ./
(/home/user) $: make generic
(/home/user) $: sudo make install

After this operation unzip will work like a charm :)
!!! AND BONUS: Not only unzip, but Archive Manager will correctly show zipped file names from now!

Revision history for this message
Vitaly (md-xytop) wrote :

And here is ppa with fixed unzip:

https://launchpad.net/~md-xytop/+archive/unzip

Revision history for this message
Keagan Winterthieme (techningeer) wrote :

Okay, I've been following this bug for some time, why hasn't this gotten fixed after YEARS of the bug being open? Sorry to rant here, but why not just get one that works? I've been using the program in my distribution, and it does NOT have this problem at all - if you ask me, I believe it is a deeper issue than the program itself.

Revision history for this message
robert leleu (robert-jean-leleu) wrote :

didn't work for me

here is the end of the update cmd

Lecture des listes de paquets... Fait
W: Une erreur s'est produite lors du contrôle de la signature. Le dépôt
n'est pas mis à jour et les fichiers d'index précédents seront utilisés.
Erreur de GPG : http://ppa.launchpad.net quantal Release : Les
signatures suivantes ne sont pas valables : BADSIG B22A95F88110A93A
Launchpad PPA for Bumlebee Project

W: Impossible de récupérer
http://ppa.launchpad.net/bumblebee/stable/ubuntu/dists/quantal/Release

W: Impossible de récupérer
http://ppa.launchpad.net/md-xytop/unzip/ubuntu/dists/quantal/main/source/Sources
  404 Not Found

W: Impossible de récupérer
http://ppa.launchpad.net/md-xytop/unzip/ubuntu/dists/quantal/main/binary-amd64/Packages
  404 Not Found

W: Impossible de récupérer
http://ppa.launchpad.net/md-xytop/unzip/ubuntu/dists/quantal/main/binary-i386/Packages
  404 Not Found

W: Le téléchargement de quelques fichiers d'index a échoué, ils ont été
ignorés, ou les anciens ont été utilisés à la place.

la 07/06/2013 23:22, Vitaly skribis (esperanto estas la unua internacia
lingvo)
> And here is ppa with fixed unzip:
>
> https://launchpad.net/~md-xytop/+archive/unzip
>

Revision history for this message
Roman Bazalevsky (rvbglas) wrote :
Download full text (4.6 KiB)

ppa has only raring version of packages, not precise/quantal

2013/6/8 robert leleu <email address hidden>

> didn't work for me
>
> here is the end of the update cmd
>
> Lecture des listes de paquets... Fait
> W: Une erreur s'est produite lors du contrôle de la signature. Le dépôt
> n'est pas mis à jour et les fichiers d'index précédents seront utilisés.
> Erreur de GPG : http://ppa.launchpad.net quantal Release : Les
> signatures suivantes ne sont pas valables : BADSIG B22A95F88110A93A
> Launchpad PPA for Bumlebee Project
>
> W: Impossible de récupérer
> http://ppa.launchpad.net/bumblebee/stable/ubuntu/dists/quantal/Release
>
> W: Impossible de récupérer
>
> http://ppa.launchpad.net/md-xytop/unzip/ubuntu/dists/quantal/main/source/Sources
> 404 Not Found
>
> W: Impossible de récupérer
>
> http://ppa.launchpad.net/md-xytop/unzip/ubuntu/dists/quantal/main/binary-amd64/Packages
> 404 Not Found
>
> W: Impossible de récupérer
>
> http://ppa.launchpad.net/md-xytop/unzip/ubuntu/dists/quantal/main/binary-i386/Packages
> 404 Not Found
>
> W: Le téléchargement de quelques fichiers d'index a échoué, ils ont été
> ignorés, ou les anciens ont été utilisés à la place.
>
>
> la 07/06/2013 23:22, Vitaly skribis (esperanto estas la unua internacia
> lingvo)
> > And here is ppa with fixed unzip:
> >
> > https://launchpad.net/~md-xytop/+archive/unzip
> >
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/580961
>
> Title:
> unzip fails to deal correctly with filename encodings
>
> Status in File Roller:
> Confirmed
> Status in The Linux Mint Distribution:
> Triaged
> Status in Ubuntu Japanese Kaizen Project:
> Fix Committed
> Status in unzip - free software .zip unarchiver:
> Unknown
> Status in “unzip” package in Ubuntu:
> Triaged
> Status in “unzip” source package in Natty:
> Won't Fix
> Status in “unzip” package in Debian:
> Confirmed
> Status in Gentoo Linux:
> Won't Fix
> Status in “unzip” package in Mandriva:
> Unknown
> Status in “unzip” package in openSUSE:
> Fix Released
>
> Bug description:
> Binary package hint: unzip
>
> This is a fairly annoying bug that's been around and known at least
> since 2005. It's very visible as it will very often make exchange of
> zip files with Windows users impossible, for example. As such, it
> gathered it's fair share of "me too" and "how dare you haven't fixed
> this yet!!111!" comments.
>
> Problem description:
> zip/unzip and the specification fall short when dealing with non-ASCII
> filenames not encoded in UTF-8
>
> test case:
> do an "unzip -l" on the file http://tinyurl.com/2aofpxs and witness the
> question marks
>
> affected programs:
> the problem is in unzip itself, but affects GUI like xarchiver,
> file-roller, etc. that rely on unzip for the decompression
>
> suggested solutions (most are workarounds, not proper fixes):
> a) reintroduce patch for codepage-based zip filenames: bug 477755,
> http://tinyurl.com/2aqdbqg (Ubuntu blueprint)
> b) unzip filename according to locale: bug 203609
> c) Ubuntu JP has a patch, probably not generally applicable, bug 26...

Read more...

Revision history for this message
robert leleu (robert-jean-leleu) wrote :

thanks

the raring ppa worked for my mint 14 (quantal?)

Changed in unzip (Ubuntu):
assignee: Пётр (plmak) → nobody
importance: High → Critical
no longer affects: unzip (Ubuntu Natty)
Revision history for this message
Ma Hsiao-chun (mahsiaochun) wrote :

Hi, Vitaly. I agree with your approach, see bug 1199239

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Rolf, or anyone else affected,

Accepted unzip into raring-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/unzip/6.0-8ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
Changed in unzip (Ubuntu Raring):
importance: Undecided → High
status: New → Fix Committed
Changed in unzip (Ubuntu Quantal):
status: New → Fix Committed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Rolf, or anyone else affected,

Accepted unzip into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/unzip/6.0-7ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Rolf, or anyone else affected,

Accepted unzip into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/unzip/6.0-4ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in unzip (Ubuntu Precise):
status: New → Fix Committed
Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

Thanks, that partially solves the problem (Precise, 6.0-4ubuntu2).

The question marks are gone, so LP: #1199239 can be marked 'Fix released'.
This (LP: #580961) bug is NOT fully addressed though.
The problem that remains is that only a few codepages are supported.

For example, unzipping a file that contains Greek (cp737) filenames doesn't work out of the box:
$ unzip -l biografiko.zip
biografiko/ОЫЮЪхЬк ймгзвуирйЮк ШхлЮйЮк_Щ.doc (wrong codepage)

On the other hand, if someone manually specifies the codepage, it does work:
$ unzip -O cp737 -l biografiko.zip
biografiko/Οδηγίες συμπλήρωσης αίτησης_β.doc (right codepage)

I'm guessing it would work if we added this line:
+ { "CP1253", "CP737" },
...in the mapping table defined in debian/patches/06-unzip60-alt-iconv-utf8:

+/* A mapping of local <-> archive charsets used by default to convert filenames
+ * of DOS/Windows Zip archives. Currently very basic. */
+static CHARSET_MAP dos_charset_map[] = {
+ { "ANSI_X3.4-1968", "CP850" },
+ { "ISO-8859-1", "CP850" },
+ { "CP1252", "CP850" },
+ { "UTF-8", "CP866" },
+ { "KOI8-R", "CP866" },
+ { "KOI8-U", "CP866" },
+ { "ISO-8859-5", "CP866" }
+};

I'm not changing the verification-needed flag because it's only partially fixed,
which might mean "verification-failed, affected people do list your codepages here so that we add them to the mapping table before we upload this to -updates",
or it might mean "verification-done and open a new bug report for adding more codepages",
your call. Thanks!

Revision history for this message
Leo (llenchikk) wrote :

@alkisg
Amazing!
It works with Russian characters.
I can't believe it happens now!

Revision history for this message
Murz (murznn) wrote :

Thanks, it works, I test this patch via ppa:frol/zip-i18n on raring and saucy before, all works!
Now with precise-proposed it it works out of the box too.
But why only for precise, will it be updated for later ubuntu releases too?

Revision history for this message
Pilot6 (hanipouspilot) wrote :

This is a good fix. But the problem is that file-roller uses p7zip-full, if it is installed.
And p7zip-full has same problem.
Is it possible to get file-roller use unzip by default?

Revision history for this message
Benjamin Drung (bdrung) wrote :

I think we should also fix p7zip-full and backport that fix to precise, too.

tags: added: verification-done-precise
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package unzip - 6.0-4ubuntu2

---------------
unzip (6.0-4ubuntu2) precise-proposed; urgency=low

  * Fix incorrectly displayed file names with UTF-8 characters.
    Add -DNO_WORKING_ISPRINT to build flags. (LP: #1199239, LP: #580961)
 -- Brian Murray <email address hidden> Wed, 06 Nov 2013 10:21:26 -0800

Changed in unzip (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote :

I think opening a new bug report for each separate code page that is missing / needed in the 06-unzip60-alt-iconv-utf8 patch, is the cleanest way forward.

Revision history for this message
Brian Murray (brian-murray) wrote :

I tested this using unzip version 6.0-8ubuntu2 from raring-proposed and confirm that the fix works.

tags: added: verification-done-raring
Revision history for this message
Brian Murray (brian-murray) wrote :

I tested this using unzip version 6.0-7ubuntu1.1 from quantal-proposed and confirm that I do not use ?? in the filenames anymore.

tags: added: verification-done-quantal
tags: removed: verification-needed
Revision history for this message
Pilot6 (hanipouspilot) wrote :

Brian,

Try to install p7zip-full and use fole-roller to unzip a file. You'll get same problem.

Revision history for this message
Brian Murray (brian-murray) wrote :

This bug was originally reported about unzip and has tasks for unzip, not file-roller or p7zip-full. So while that issue is worth fixing I think that too should be a separate bug report.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package unzip - 6.0-8ubuntu2

---------------
unzip (6.0-8ubuntu2) raring-proposed; urgency=low

  * Fix incorrectly displayed file names with UTF-8 characters.
    Add -DNO_WORKING_ISPRINT to build flags. (LP: #1199239, LP: #580961)
 -- Brian Murray <email address hidden> Wed, 06 Nov 2013 09:40:08 -0800

Changed in unzip (Ubuntu Raring):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for unzip has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package unzip - 6.0-7ubuntu1.1

---------------
unzip (6.0-7ubuntu1.1) quantal-proposed; urgency=low

  * Fix incorrectly displayed file names with UTF-8 characters.
    Add -DNO_WORKING_ISPRINT to build flags. (LP: #1199239, LP: #580961)
 -- Brian Murray <email address hidden> Wed, 06 Nov 2013 10:31:34 -0800

Changed in unzip (Ubuntu Quantal):
status: Fix Committed → Fix Released
Revision history for this message
Ľubomír Mlích (hater-zlin) wrote :

opened next bug as suggested in #140 and #146 https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/1255640

Revision history for this message
Bib (bybeu) wrote :

Thank you very much. Now no more this issue in raring :)

Revision history for this message
Stamatis Papadakis (stpapadakis) wrote : Invitation to connect on LinkedIn

LinkedIn
------------

Bug,

I'd like to add you to my professional network on LinkedIn.

- Stamatis

Stamatis Papadakis
Teacher at Ministry of Education
Greece

Confirm that you know Stamatis Papadakis:
https://www.linkedin.com/e/z1hn7d-hqg86rw0-15/isd/19345317481/vq-4EdaH/?hs=false&tok=18qO2_nWWRrS41

--
You are receiving Invitation to Connect emails. Click to unsubscribe:
http://www.linkedin.com/e/z1hn7d-hqg86rw0-15/hKAisqMshwBi7pRnECvPQ-Js3OMKZ8xGMeSTs59/goo/580961%40bugs%2Elaunchpad%2Enet/20061/I6275361032_1/?hs=false&tok=3ljOXvvGWRrS41

(c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.

Changed in unzip (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Pilot6 (hanipouspilot) wrote :

I created a new bug report regarding same problem with p7zip-full

https://bugs.launchpad.net/ubuntu/+source/p7zip/+bug/1382106

Revision history for this message
Sebastian Geiger (lanoxx) wrote :

>test case:
>do an "unzip -l" on the file http://tinyurl.com/2aofpxs and witness the question marks

Sorry for posting on this already quite long bug. I am using Ubuntu 14.04 (LTS) with all the latest updates.

This bug is still present for me. I need to explicitly pass `-O 850` to enable codepage 850 in order to resolve it. Interestingly passing CP-850 does not work. Took me quite some time to figure this out.

Not the subtle differences in the resulting filename for the two commands below:

ERROR:

  $unzip Gyakorlat\ hallgatói\ segédlet\ -\ 7.\ Webes\ alkalmazások\ fejlesztése.zip
Archive: Gyakorlat hallgatói segédlet - 7. Webes alkalmazások fejlesztése.zip
  inflating: Gyakorlat hallgatвi segВdlet - 7. Webes alkalmazаsok fejlesztВse.doc

WORKAROUND:

 $unzip -O 850 Gyakorlat\ hallgatói\ segédlet\ -\ 7.\ Webes\ alkalmazások\ fejlesztése.zip
Archive: Gyakorlat hallgatói segédlet - 7. Webes alkalmazások fejlesztése.zip
  inflating: Gyakorlat hallgatói segédlet - 7. Webes alkalmazások fejlesztése.doc

Question: Is this a regression or am I experiencing a different bug unrelated to this one? If I am not mistaken then this was fixed before Ubuntu 14.04 and so the fix should also be included in Ubuntu 14.04 or not?

Revision history for this message
Sebastian Geiger (lanoxx) wrote :

P.S.: I figured out correct code page for the `-O 850` option by opening the file in question on a windows machine which correctly unpacked the file, then I used the windows command line tool `chcp` to get the current code page on that windows machine:

>chcp
Active code page: 850
>

Revision history for this message
CSRedRat (csredrat) wrote :

Hendrik Knackstedt (hennekn) on 2014-02-02
Changed in unzip (Ubuntu):
status: Triaged → Fix Released

In 7z:
Pilot6 (hanipouspilot) wrote on 2014-10-18: #157
I created a new bug report regarding same problem with p7zip-full

https://bugs.launchpad.net/ubuntu/+source/p7zip/+bug/1382106

Revision history for this message
Yuan Chao (yuanchao) wrote :

Wouldn't it be better to patch file-roller to support encoding settings? Otherwise it still breaks if you get zipped file with different encoding from the default.

Revision history for this message
In , Bwiedemann (bwiedemann) wrote :

This is an autogenerated message for OBS integration:
This bug (540598) was mentioned in
https://build.opensuse.org/request/show/39794 Factory / unzip
https://build.opensuse.org/request/show/40783 11.2 / librcd0
https://build.opensuse.org/request/show/40784 11.2 / librcc0
https://build.opensuse.org/request/show/40785 11.2 / unzip
https://build.opensuse.org/request/show/40799 11.2:Test / unzip

Mathew Hodson (mhodson)
Changed in unzip (Debian):
status: Confirmed → Unknown
Mathew Hodson (mhodson)
affects: unzip → ubuntu-translations
Changed in ubuntu-translations:
importance: Medium → Undecided
status: Unknown → New
no longer affects: ubuntu-translations
affects: gentoo → ubuntu-translations
Changed in ubuntu-translations:
importance: Medium → Undecided
status: Won't Fix → New
no longer affects: ubuntu-translations
Mathew Hodson (mhodson)
affects: unzip (Mandriva) → ubuntu-translations
Changed in ubuntu-translations:
importance: High → Undecided
status: Unknown → New
no longer affects: ubuntu-translations
Mathew Hodson (mhodson)
affects: file-roller → ubuntu-translations
Changed in ubuntu-translations:
importance: Medium → Undecided
status: Confirmed → New
no longer affects: ubuntu-translations
affects: unzip (Debian) → ubuntu-translations
Changed in ubuntu-translations:
importance: Unknown → Undecided
status: Unknown → New
no longer affects: ubuntu-translations
affects: linuxmint → ubuntu-translations
no longer affects: ubuntu-translations
Changed in unzip (Ubuntu Precise):
importance: Undecided → High
Changed in unzip (Ubuntu Quantal):
importance: Undecided → High
Revision history for this message
Mathew Hodson (mhodson) wrote :

I've closed the remaining tasks. This particular bug was fixed in Precise and later. For remaining issues in p7zip and file-roller, see Bug #1382106 and Bug #495880

---------------
unzip (6.0-4ubuntu2) precise-proposed; urgency=low

  * Fix incorrectly displayed file names with UTF-8 characters.
    Add -DNO_WORKING_ISPRINT to build flags. (LP: #1199239, LP: #580961)
 -- Brian Murray <email address hidden> Wed, 06 Nov 2013 10:21:26 -0800

Changed in ubuntu-jp-improvement:
status: Fix Committed → Fix Released
tags: removed: needs-reassignment
Revision history for this message
Unxed (unxed) wrote :

Wrote a patch for unzip fixing this issue:
https://sourceforge.net/p/infozip/patches/29/

The same patch for p7zip:
https://sourceforge.net/p/p7zip/bugs/187/

f (andrewkuzbass)
Changed in unzip (Ubuntu):
assignee: nobody → f (andrewkuzbass)
assignee: f (andrewkuzbass) → nobody
To post a comment you must log in.