Broken encoding in Russian-translated HTML templates

Bug #1777349 reported by WGH
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mailman (Debian)
Fix Released
Unknown
mailman (Ubuntu)
Fix Released
Medium
Unassigned
Bionic
Triaged
Low
Unassigned

Bug Description

Description: Ubuntu 18.04 LTS
Release: 18.04

mailman:
  Installed: 1:2.1.26-1
  Candidate: 1:2.1.26-1
  Version table:
 *** 1:2.1.26-1 500
        500 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages
        100 /var/lib/dpkg/status

The Ubuntu package contains Russian-translated HTML templates in really curious text encoding: UTF-8 interpreted as KOI8-R and converted into UTF-8. Needless to say, the text looks like this: пёп╨п╟п╤п╦я┌п╣ п©п╟я─п╬п╩я▄:

The upstream mailman has been using UTF-8 for Russian HTML templates since 2015 (https://bugs.launchpad.net/mailman/+bug/1418448). My guess is that Ubuntu package scripts (?) has been converting KOI8-R to UTF-8 before upstream switched to UTF-8, and continued to do so after switch, resulting in broken files.

Revision history for this message
WGH (wgh) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Thanks for filing this bug in Ubuntu, and also in Debian. The mailman package in Ubuntu is a straight sync from the Debian one, so it's best to be fixed there first.

This might have something to do with debian/patches/91_utf8.patch. I see it doing some charset manipulation in the "install:" target, and this line looks suspiscious:
+ CHARSET_ru=koi8-r; \

Changed in mailman (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

It does look correct when displayed in a web browser, or not? I confirm the actual html file looks like you said, but the browser gives me a page like the attached screenshot (never mind it says "xenial" there, it's a bionic container, I just made a mistake with the hostname).

Revision history for this message
WGH (wgh) wrote :

This is indeed correct.

In my case it didn't render correctly though. I think that was because the template also included parts in proper encoding (e.g. list description or localized messages from gettext), and the browser can't really recover from mixed incorrect encoding.

Revision history for this message
WGH (wgh) wrote :

I'll need to spin up my own test container to properly test why it ever renders correctly. I don't think browsers are actually able to figure out that the text should be decoded from UTF-8 to KOI8-R and then interpreted as UTF-8. My Firefox wasn't able to do that when I opened template source file directly.

Maybe default Debian mailman configuration somehow fixes the encoding back into something recoverable, I don't know.

Though, frankly, I don't really want to investigate why obviously broken configuration works in certain cases, when the fix is (more or less) apparent :)

Changed in mailman (Debian):
status: Unknown → New
Changed in mailman (Debian):
status: New → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Changed in mailman (Ubuntu Bionic):
status: New → Triaged
Changed in mailman (Ubuntu):
status: Triaged → Fix Released
tags: added: server-todo
tags: removed: server-todo
Changed in mailman (Ubuntu Bionic):
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.