Ubuntu
unzip package

unzip uses Russian Cyrillic CP866 as the OEM encoding, even if the Russian locale is not selected in the system

Bug #2066389 reported by Unxed on 2024-05-22

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	unzip (Ubuntu)	Fix Released	Undecided	Unassigned

Bug Description

The built-in .zip archiver in Windows uses DOS (OEM) code page corresponding to current regional settings for new archives. Lots of such archives exist.

The problem is that Ubuntu's unzip is stuck with CP866 for such archives. Have a look at
20-unzip60-alt-iconv-utf8.patch
especially on mapping of system charset to charsets unzip expects to have in archive

+/* A mapping of local <-> archive charsets used by default to convert filenames
+ * of DOS/Windows Zip archives. Currently very basic. */
+static CHARSET_MAP dos_charset_map[] = {
+ { "ANSI_X3.4-1968", "CP850" },
+ { "ISO-8859-1", "CP850" },
+ { "CP1252", "CP850" },
+ { "UTF-8", "CP866" },
+ { "KOI8-R", "CP866" },
+ { "KOI8-U", "CP866" },
+ { "ISO-8859-5", "CP866" }
+};

As you see, CP866 is selected on all systems having UTF-8 as system charset (almost any modern system). Definitely not correct behavior.

The correct behavior is to determine the relevant OEM or ANSI code page based on the system locale and use it. You can look at this PR for reference implementation:
https://github.com/p7zip-project/p7zip/pull/232

Upstream issue:
https://sourceforge.net/p/infozip/bugs/43/#951c

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: unzip 6.0-28ubuntu4
ProcVersionSignature: User Name 6.8.0-31.31-generic 6.8.1
Uname: Linux 6.8.0-31-generic x86_64
ApportVersion: 2.28.1-0ubuntu2
Architecture: amd64
CasperMD5CheckMismatches: ./boot/grub/grub.cfg
CasperMD5CheckResult: fail
CurrentDesktop: ubuntu:GNOME
Date: Wed May 22 11:05:59 2024
InstallationDate: Installed on 2024-04-29 (23 days ago)
InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424)
ProcEnviron:
LANG=en_US.UTF-8
PATH=(custom, no user)
SHELL=/bin/bash
TERM=xterm-256color
XDG_RUNTIME_DIR=<set>
SourcePackage: unzip
UpgradeStatus: No upgrade log present (probably fresh install)

See original description

Tags:

Related branches

~mitya57/ubuntu/+source/unzip:fix-code-pages

Merged into ubuntu/+source/unzip:ubuntu/devel at revision 8d0362fcc3761dc75fe42de312eb5a067533f68d

Dominik Viererbe (community): Approve on 2024-06-11

Revision history for this message

Unxed (unxed) wrote on 2024-05-22:

Dependencies.txt Edit (156 bytes, text/plain; charset="utf-8")
ProcCpuinfoMinimal.txt Edit (978 bytes, text/plain; charset="utf-8")

Revision history for this message

Launchpad Janitor (janitor) wrote on 2024-05-22:

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in unzip (Ubuntu):
status:	New → Confirmed

Revision history for this message

Unxed (unxed) wrote on 2024-05-22 (last edit on 2024-05-22):

Debian's 7zip just merged a fix for the same problem:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=779207#51

Revision history for this message

Unxed (unxed) wrote on 2024-05-24 (last edit on 2024-06-08):

Here is unzip with such problem fixed:
https://github.com/unxed/unzip/tree/ubuntu

Nuances:

1) Everything here is done on top of the full set of patches from Ubuntu, this is reflected in the commit history. That is, you can make a new patch from this repo and add it to Ubuntu, and everything will be ok, in theory. This was done because I didn’t want to take authorship of someone else’s code (for example, with support for the -I and -O options in my patch) but wanted to supplement it with correct auto-detection of the encoding.

2) There is no support for archives with ANSI encoding yet (such archives also exist, although they are not very common). However, it was not there before anyway. Will probably add it in future similar to how it is done in 7zip.

UPD: Added ANSI archives support

Unxed (unxed) on 2024-06-11

description:

updated

Revision history for this message

Launchpad Janitor (janitor) wrote on 2024-06-14:

This bug was fixed in the package unzip - 6.0-28ubuntu5

---------------
unzip (6.0-28ubuntu5) oracular; urgency=medium

  [ Ivan Sorokin ]
  * Add 30-fix-code-pages.patch with the following fixes (LP: #2066389):
    - Fixed bit 11 of General purpose flag support on systems with UTF-8
      system charset.
    - Fixed OEM code page being always assumed Russian/Cyrillic CP866 on
      any UTF-8 system.
    - Added proper OEM code page detection based on system locale setting.
    - Removed translation from ISO 8859-1 to local charset; assumption that
      any non-unicode archive uses it is for sure wrong as it can be any
      charset used on archive creator's local system; also do not treat
      PKZIP for UNIX 2.51 archives as having ISO 8859-1 charset for the
      same reasons.
    - Enabled UTF-8 output by default on Unix systems.

[ Dmitry Shachnev ]
* Add tests for unicode file names in different encodings.

-- Dmitry Shachnev <email address hidden> Tue, 11 Jun 2024 21:48:13 +0300

Changed in unzip (Ubuntu):
status:	Confirmed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

debbugs #779207
[open wishlist l10n patch] Edit

Bug watches keep track of this bug in other bug trackers.

Ubuntuunzip package

unzip uses Russian Cyrillic CP866 as the OEM encoding, even if the Russian locale is not selected in the system

Bug Description

Related branches

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
unzip package