th_TH Affix File Inadequate for Hunspell
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
openoffice.org-dictionaries (Ubuntu) |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
The problems reported here all relate to the file th_TH.aff, installed from Version 1:3.2.0-3ubuntu3.1 of package myspell-th onto the Ubunut system described as:
Description: Ubuntu 10.04.3 LTS
Release: 10.04
1. The line 'SET TIS620-2533' needs to read 'SET TIS620-2533' for Hunspell to work with iconv - at present Hunspell issues the error message 'error - iconv_open: UTF-8 -> TIS620-2533' (or error - iconv_open: tis620 -> TIS620-2533 if the interface is specified as -i tis620). When the correction is made, then the command
$ (echo '\!'; echo '-'; echo สะกัด อไร หณา) | /usr/bin/hunspell -d ~/spell/th_TH | more
reasonably generates the output
Hunspell 1.2.8
& สะกัด 4 0: สะกิด, สะกดทัพ, สะกด, สะบัด
& อไร 4 6: อุไร, อะไร, ขอบไร, ฤร้
& หณา 4 10: อาณา, อุณา, สกุณา, ยฆษณา
Without the change, it generates the output (shown shorn of error messages):
Hunspell 1.2.8
# สะกัด 0
# อไร 16
# หณา 26
2. The affix file lacks a TRY line to list characters that might have been omitted or utterly mistyped - all the suggestions above are generated by the n-gram algorithm. The acceptance criteria for n-gram outputs is tightened by Version 1.3.2 of Hunspell, and so mistyping อไร for อะไร will not be corrected. A suitable addition would be:
# The try list excludes the 5 letters and marks ฅ๎ฃฺฦ, which are not used in
# the orthography of Central Thai.
# The consonants are ordered lower case native, upper case native,
# common Indic, other extras
TRY ก่ข้ค๊ง๋
(Note that the file th_TH.aff is in the TIS-620 encoding.)
3. There are further faults, some of which have already been fixed in Hunspell Version 1.3.2, but they are all internal to Hunspell. Unfixed faults are recorded against Hunspell at:
http://
https:/
Coding to correct them (which may need to be applied more widely) is recorded in
https:/
ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: myspell-th 1:3.2.0-3ubuntu3.1 [modified: usr/share/
ProcVersionSign
Uname: Linux 2.6.32-37-generic i686
Architecture: i386
Date: Sat Dec 31 17:58:55 2011
InstallationMedia: Ubuntu 10.04.1 LTS "Lucid Lynx" - Release i386 (20100816.1)
PackageArchitec
ProcEnviron:
LANGUAGE=en_GB:en
PATH=(custom, user)
LANG=en_GB.utf8
SHELL=/bin/bash
SourcePackage: openoffice.
Status changed to 'Confirmed' because the bug affects multiple users.