Hyphenation of Russian text doesn't work in Firefox due to unrecognized encoding

Bug #1107859 reported by lohmatii netizen on 2013-01-28
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
firefox (Ubuntu)
Low
Unassigned
hyphen-ru (Ubuntu)
Medium
Gunnar Hjalmarsson

Bug Description

To the sponsor: Please upload the source in this PPA:
https://launchpad.net/~gunnarhj/+archive/ubuntu/hyphen-ru

[Original description]
there is hyph_ru_RU.dic in the /usr/share/hyphen folder. but russian hyphenation doesn't work

debian/patches/dont-include-hyphenation-patterns.patch excluded hyphenation files from Ubuntu Firefox build

everything works when I extract hyphenation folder from omni.ja file (from original Firefox tarball) and move it content to the /usr/share/hyphen/

check you current hyphen. status http://hyphenator.googlecode.com/svn/dictChecker.html
---
ApportVersion: 2.8-0ubuntu2
Architecture: amd64
DistroRelease: Ubuntu 13.04
InstallationDate: Installed on 2012-11-09 (81 days ago)
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Release amd64 (20121017.5)
MarkForUpload: True
Package: openoffice.org-hyphenation 0.6
PackageArchitecture: all
ProcVersionSignature: Ubuntu 3.8.0-2.6-generic 3.8.0-rc4
Tags: raring running-unity
Uname: Linux 3.8.0-2-generic x86_64
UpgradeStatus: Upgraded to raring on 2012-12-06 (53 days ago)
UserGroups: adm lpadmin sambashare sudo

Chris Coulson (chrisccoulson) wrote :

If it works after extracting the Firefox patterns to /usr/share/hyphen, then it's a bug in the system patterns

affects: firefox (Ubuntu) → openoffice.org-hyphenation (Ubuntu)

lohmatii netizen, thank you for taking the time to report this bug and helping to make Ubuntu better. Please execute the following command, as it will automatically gather debugging information, in a terminal:
apport-collect 1107859
When reporting bugs in the future please use apport by using 'ubuntu-bug' and the name of the package affected. You can learn more about this functionality at https://wiki.ubuntu.com/ReportingBugs.

Changed in openoffice.org-hyphenation (Ubuntu):
status: New → Incomplete

apport information

tags: added: apport-collected raring running-unity
description: updated

apport information

Changed in openoffice.org-hyphenation (Ubuntu):
status: Incomplete → New

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in openoffice.org-hyphenation (Ubuntu):
status: New → Confirmed
Dmitri Minaev (minaev) wrote :

A bit more information on this bug.

1. A workaround is to force the execution of the following JS code on every web page:

var defaultLang = 'ru';

// set LANG attribute if missing
var head=document.getElementsByTagName('head').item(0);
var html=document.getElementsByTagName('html').item(0);
var lang = html.getAttribute('xml:lang');
if (lang != '' && lang != null) {
html.lang = lang;
} else if (html.lang == '') {
html.lang = defaultLang;
}
html.lang = html.lang.substr(0, 2);

(borrowed from http://txt.arboreus.com/2012/03/20/force-css-hyphenation-in-firefox-greasemonkey-script.html)

To do so, you can use Firefox extension 'dotjs': https://addons.mozilla.org/en-US/firefox/addon/dotjs/

2. Even then Russian hyphenation works with errors. On many occasions, hyphen is misplaced one or even two letters from its correct location: проб-ежал instead of про-бе-жал; трен-ировок instead of тре-ни-ро-вок; лаге-рь instead of ла-герь, обслуж-ивающего instead of об-слу-жи-ва-ю-ще-го; палатк-ами instead of па-лат-ка-ми; ск-оростью instead of ско-ростью, etc.

I have an impression that in Ubuntu 12.04 errors took place almost always. In Ubuntu 14.10, they constitute about 20% of all occasions.

However, these errors are only found in Ubuntu Firefox. If you download the vanilla Firefox from mozilla.org and run it using the same profile, hyphenation is correct.

Gunnar Hjalmarsson (gunnarhj) wrote :

Hyphenation for Russian has for a while been provided by the hunspell-ru source package. Does this issue appear in Ubuntu 14.04 or later?

affects: openoffice.org-hyphenation (Ubuntu) → hunspell-ru (Ubuntu)
Changed in hunspell-ru (Ubuntu):
status: Confirmed → Incomplete
Gunnar Hjalmarsson (gunnarhj) wrote :

Sorry, I meant hyphen-ru.

affects: hunspell-ru (Ubuntu) → hyphen-ru (Ubuntu)
Dmitri Minaev (minaev) wrote :

@gunnarhj, I only have 12.04 upgraded to 15.04 and the problem persists there.

Gunnar Hjalmarsson (gunnarhj) wrote :

Thanks, Dmitri.

I attached an alternative dictionary (from the libreoffice-dictionaries source package). The currently installed file is:

/usr/share/hyphen/hyph_ru_RU.dic

As a test, can you please replace it with the attached file (after having unzipped it with gunzip), and let us know if it makes a difference.

Dmitri Minaev (minaev) wrote :

No, Gunnar, your file produces the same erroneous hyphenation as the original one.

But I think I know now, what is going wrong. The dictionary provided in hyphen-ru package (and yours, too) is in the old single-byte encoding KOI8-R. Firefox does check the encoding (given in the first line of the dictionary file), but calculates the offset of the hyphen position in bytes instead of Unicode characters (or the other way round, I'm not yet sure).

So, if we take the stock hyph_ru_RU.dic from the Ubuntu package hyphen-ru, convert it to UTF-8 and edit the first line from 'KOI8-R' to 'UTF-8', Firefox hyphenates the text correctly.

Gunnar Hjalmarsson (gunnarhj) wrote :

That's very interesting, Dmitri. Can you then install hyphen-ru from this PPA:

https://launchpad.net/~gunnarhj/+archive/ubuntu/hyphen-ru

and let us know if it works. If it works on Firefox, can you also test with e.g. LibreOffice.

Gunnar Hjalmarsson (gunnarhj) wrote :

ping

Anybody who can test if the PPA package fixes this issue?

Dmitri Minaev (minaev) wrote :

Hello. Sorry for the later reply. Your first package, hyphen-ru_20030310-1ubuntu2~ppa_all.deb, contains the dictionary not in UTF-8 encoding. The second one, hyphen-ru_20030310-1ubuntu2~ppa2_all.deb, works as supposed.

As for LibreOffice, it hyphenates Russian text correctly with both your dictionary and the one installed from Ubuntu repository (KOI8-R-encoded). So, the bug is probably in Firefox, not in the dictionary.

Gunnar Hjalmarsson (gunnarhj) wrote :

Thanks for letting us know!

Yeah, after my "ping" I realized that I had made a mistake with the first PPA package. I had let gedit guess the encoding, and gedit guessed wrong (thought it was ISO-8859-15). So maybe there is some ambiguity which cheats both Firefox and gedit.

Anyway, since the second PPA package works, let's ask for help to get it into the archive, may it be a workaround or not.

@Mattia: Do you think this should be forwarded to hyphen-ru in Debian, or should we take this opportunity to fix it in lo-dicts instead and start building hyphen-ru in lo-dicts both in Debian and Ubuntu?

Changed in hyphen-ru (Ubuntu):
assignee: nobody → Gunnar Hjalmarsson (gunnarhj)
importance: Undecided → Medium
status: Incomplete → In Progress
description: updated
Mattia Rizzolo (mapreri) on 2016-03-14
summary: - Hyphenation doesn't work in Russian text
+ Hyphenation of Russian text doesn't work in Firefox due to unrecognized
+ encoding
Mattia Rizzolo (mapreri) wrote :

I agree it's more of a firefox issue than a hyphen-ru thing, whatever we don to hyphen-ru would be a workaround. Still, I have a feeling firefox will take forever to be fixed…

I wouldn't like adding such a patch to lo-dicts, if such a thing would be needed there I'd try to do it a build time instead, and I'd do so very sadly.
Then, taking over a package in debian a kind of social problem :(, it needs to be done slowly and with great care, otherwise bad things may happen. If I were you in the meantime I'd
1) upload that hyphen-ru of yours to close this bug
2
) open a bug against upstream lo-dicts to ask to convert the file to UTF-8

Gunnar Hjalmarsson (gunnarhj) wrote :

On 2016-03-14 12:23, Mattia Rizzolo wrote:
> I wouldn't like adding such a patch to lo-dicts, if such a thing
> would be needed there I'd try to do it a build time instead, and I'd
> do so very sadly.

Ack. With "fix it in lo-dicts" I actually meant upstream.

> Then, taking over a package in debian a kind of social problem :(, it
> needs to be done slowly and with great care, otherwise bad things may
> happen.

I happily hand over to you to handle that. ;)

> If I were you in the meantime I'd
> 1) upload that hyphen-ru of yours to close this bug

It's already in the sponsor queue.
http://reqorts.qa.ubuntu.com/reports/sponsoring

> 2) open a bug against upstream lo-dicts to ask to convert the file to
> UTF-8

If I understand it correctly, they don't want bug reports against extensions, but suggest that you contact the author instead... Possibly I'll do it, but it will have to wait.

Sebastien Bacher (seb128) wrote :

Thanks Gunnar, it seems like firefox should deal with other encoding but if the encoding change is not confusing other users or the dict (unsure which other programs use it) then I guess it's fine to do as a workaround, uploading

Changed in hyphen-ru (Ubuntu):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package hyphen-ru - 20030310-1ubuntu2

---------------
hyphen-ru (20030310-1ubuntu2) xenial; urgency=low

  * debian/patches/convert-to-utf8.diff:
    Convert pattern to UTF-8 (LP: #1107859).

 -- Gunnar Hjalmarsson <email address hidden> Sun, 13 Mar 2016 04:34:00 +0100

Changed in hyphen-ru (Ubuntu):
status: Fix Committed → Fix Released
Adolfo Jayme (fitojb) on 2016-03-15
Changed in firefox (Ubuntu):
importance: Undecided → Wishlist
Gunnar Hjalmarsson (gunnarhj) wrote :

I don't agree this is a "wishlist" thing. Firefox should be able to use KOI8-R encoded hyphenation dicts as other applications are, so this is indeed a bug.

Changed in firefox (Ubuntu):
importance: Wishlist → Low
Paul White (paulw2u) wrote :

Further to comment #20, is this now resolved in Firefox?

Changed in firefox (Ubuntu):
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers