Comment 20 for bug 495880

Revision history for this message
Misaki (myjunkmail311006) wrote :

Along with the command-line version of unzip with the -O option, you can also use the convmv command to change filenames of previously extracted files. This works on an ext3 filesystem, but NTFS may give an error because filenames are invalid. ext3 says the encoding is invalid but still lets them be renamed to that.

So for anyone who encounters this bug report and is concerned specifically with extracting filenames from shift-jis encoded archives, these are the commands you can use:

(navigate to directory, and...)
unzip -O shift-jis <filename>
or
convmv * -f utf8 -t iso8859-1 -r
convmv * -f utf8 -t iso8859-1 --notest -r ; convmv -f shift-jis -t utf8 * --notest -r

One of the other bug reports links to this, which lists solutions and problems:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=483290

One consideration might be if files in the same archive use different encoding types. It seems reasonable that they appeared that way to the archive's creator, and thus they shouldn't be interpreted separately, but it could lead to the wrong conversion method being selected.

The patch linked in that bug report describes the options -O and -I, which aren't documented in unzip's manual pages, so it's possible that patch is already applied. But it still didn't detect 'proper' encodings for me when I tested file-roller, using unzip, on several archives after uninstalling p7zip.

The patch also talks the current locale charset. Making assumptions about the encoding used on files could be correct for many people, most of the time, but will be incorrect for other people, and so is at best only a partial solution. I don't know what the patch does after that though.

Just for reference, these are other Debian bugs mentioned in that report:

> Bug#197427: unzip: chinese filenames unwrapped on unix wrongly
> Bug#197428: unzip: zipinfo (and unzip) can't deal with chinese filenames like miniunzip can
> Bug#339021: unzip: incorrectly converts cyrillic file names from Windows-created ZIPs