The built-in .zip archiver in older versions of Windows used DOS (OEM) or Windows (ANSI) code page corresponding to current regional settings for new archives. Lots of such archives still exist.
The problem is that Ubuntu's unzip is stuck with CP866 for such archives. Have a look at
20-unzip60-alt-iconv-utf8.patch
especially on mapping of system charset to charsets unzip expects to have in archive
+/* A mapping of local <-> archive charsets used by default to convert filenames
+ * of DOS/Windows Zip archives. Currently very basic. */
+static CHARSET_MAP dos_charset_map[] = {
+ { "ANSI_X3.4-1968", "CP850" },
+ { "ISO-8859-1", "CP850" },
+ { "CP1252", "CP850" },
+ { "UTF-8", "CP866" },
+ { "KOI8-R", "CP866" },
+ { "KOI8-U", "CP866" },
+ { "ISO-8859-5", "CP866" }
+};
As you see, CP866 is selected on all systems having UTF-8 as system charset (almost any modern system). Definitely not correct behavior.
The correct behavior is to determine the relevant OEM or ANSI code page based on the system locale and use it. You can look at this PR for reference implementation: https://github.com/p7zip-project/p7zip/pull/232
The built-in .zip archiver in older versions of Windows used DOS (OEM) or Windows (ANSI) code page corresponding to current regional settings for new archives. Lots of such archives still exist.
The problem is that Ubuntu's unzip is stuck with CP866 for such archives. Have a look at alt-iconv- utf8.patch
20-unzip60-
especially on mapping of system charset to charsets unzip expects to have in archive
+/* A mapping of local <-> archive charsets used by default to convert filenames
+ * of DOS/Windows Zip archives. Currently very basic. */
+static CHARSET_MAP dos_charset_map[] = {
+ { "ANSI_X3.4-1968", "CP850" },
+ { "ISO-8859-1", "CP850" },
+ { "CP1252", "CP850" },
+ { "UTF-8", "CP866" },
+ { "KOI8-R", "CP866" },
+ { "KOI8-U", "CP866" },
+ { "ISO-8859-5", "CP866" }
+};
As you see, CP866 is selected on all systems having UTF-8 as system charset (almost any modern system). Definitely not correct behavior.
The correct behavior is to determine the relevant OEM or ANSI code page based on the system locale and use it. You can look at this PR for reference implementation: /github. com/p7zip- project/ p7zip/pull/ 232
https:/
Upstream issue: /sourceforge. net/p/infozip/ bugs/43/ #951c
https:/
ProblemType: Bug ature: User Name 6.8.0-31.31-generic 6.8.1 ismatches: ./boot/ grub/grub. cfg esult: fail 256color DIR=<set>
DistroRelease: Ubuntu 24.04
Package: unzip 6.0-28ubuntu4
ProcVersionSign
Uname: Linux 6.8.0-31-generic x86_64
ApportVersion: 2.28.1-0ubuntu2
Architecture: amd64
CasperMD5CheckM
CasperMD5CheckR
CurrentDesktop: ubuntu:GNOME
Date: Wed May 22 11:05:59 2024
InstallationDate: Installed on 2024-04-29 (23 days ago)
InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424)
ProcEnviron:
LANG=en_US.UTF-8
PATH=(custom, no user)
SHELL=/bin/bash
TERM=xterm-
XDG_RUNTIME_
SourcePackage: unzip
UpgradeStatus: No upgrade log present (probably fresh install)