Hunspell 1.2.8 Groups Thai TIS-620 Chars in Lower/Upper Case Pairs

Bug #910452 reported by Richard Wordingham on 2011-12-31
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
hunspell (Ubuntu)
Undecided
Unassigned

Bug Description

Ubuntu release:
Description: Ubuntu 10.04.3 LTS
Release: 10.04
Package: 1.2.8-6ubuntu1

Casing information for ISO-8859-1 is applied to dictionaries encoded in TIS-620. This was fixed in Release 1.2.14 of Hunspell, by adding elements such as {"tis620", tis620_tbl} to the array encds[] in file csutil.cxx. A minimal change on Release 1.2.8 to correct the problem would be to add entries such as {"TIS620", iscii_devanagari_tbl} and possibly {"TIS620-2533", iscii_devanagari_tbl}.

I forgot to report the effects of the bug.

The effect of the bug can be demonstrated using spelling dictionary th_TH.dic from myspell-th Version 1:3.2.0-3ubuntu3.1 but with th_TH.aff modified by correcting 'SET TIS620-2533' to 'SET TIS620' (http://bugs.launchpad.net/ubuntu/+source/openoffice.org-dictionaries/+bug/910447 refers). Without this change, the corrections of สะกัด to สกัด and หณา to หมา are not offered - running with locale set by LANG=en_GB.utf8 the suggestion lines for input are are

& สะกัด 4 0: สะกิด, สะกดทัพ, สะกด, สะบัด
& อไร 4 16: อุไร, อะไร, ขอบไร, ฤร้
& หณา 4 26: อาณา, อุณา, สกุณา, ยฆษณา

when the program is run using (echo '\!'; echo '-'; echo สะกัด อไร หณา)| hunspell -d th_TH

When encds[] is corrected as suggested above, the suggestion lines become

& สะกัด 6 0: สะกด, สกัด, สะกิด, สะบัด, สังกัด, สะดวก
& อไร 4 16: อุไร, อะไร, อมร, อรไท
& หณา 5 26: หา, ห่า, หรา, หมา, หนา

 Note that the corrections of สะกัด to สกัด and หณา to หมา are then offered. Additionally, the non-existent words ฤร้ and ยฆษณา are no longer offered.

FWIW, a suitably modified csutils.cxx can also be found in http://homepage.ntlworld.com/richard.wordingham/thai/hunspell-1.2.8-jrw1.1.zip , along with corrections for the other issues I have had in getting Hunspell to spell check word-separated Thai.

Formally, this is fixed for Lucid Lynx by upgrading to 1.3.2-2~lucid1. (Lucid upgrade appears to be today, apparently connected with change of OpenOffice from 3.3.2 to 3.4.5.) Unfortunately, the 'n-gram' selection criteria of Hunspell 1.3.2 reduce the suggestions to:

& สะกัด 2 0: สะกด, สกัด
& อไร 1 16: อุไร
& หณา 2 26: หา, อาณา

(No change to th_TH.aff is needed for it to be used by Version 1.3.2-2~lucid1.)

Implementing my suggestions for th_TH.aff (https://bugs.launchpad.net/ubuntu/+source/openoffice.org-dictionaries/+bug/910447) yields the suggestion list:

& สะกัด 4 0: สะกด, สกัด, สะกิด, สะบัด
& อไร 3 16: อะไร, อุไร, อมร
& หณา 6 26: หา, ห่า, หนา, หมา, หรา, อาณา

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in hunspell (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers