th_TH Affix File Inadequate for Hunspell

Bug #910447 reported by Richard Wordingham
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
openoffice.org-dictionaries (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

The problems reported here all relate to the file th_TH.aff, installed from Version 1:3.2.0-3ubuntu3.1 of package myspell-th onto the Ubunut system described as:

Description: Ubuntu 10.04.3 LTS
Release: 10.04

1. The line 'SET TIS620-2533' needs to read 'SET TIS620-2533' for Hunspell to work with iconv - at present Hunspell issues the error message 'error - iconv_open: UTF-8 -> TIS620-2533' (or error - iconv_open: tis620 -> TIS620-2533 if the interface is specified as -i tis620). When the correction is made, then the command

$ (echo '\!'; echo '-'; echo สะกัด อไร หณา) | /usr/bin/hunspell -d ~/spell/th_TH | more

reasonably generates the output

Hunspell 1.2.8

& สะกัด 4 0: สะกิด, สะกดทัพ, สะกด, สะบัด
& อไร 4 6: อุไร, อะไร, ขอบไร, ฤร้
& หณา 4 10: อาณา, อุณา, สกุณา, ยฆษณา

Without the change, it generates the output (shown shorn of error messages):
Hunspell 1.2.8

# สะกัด 0
# อไร 16
# หณา 26

2. The affix file lacks a TRY line to list characters that might have been omitted or utterly mistyped - all the suggestions above are generated by the n-gram algorithm. The acceptance criteria for n-gram outputs is tightened by Version 1.3.2 of Hunspell, and so mistyping อไร for อะไร will not be corrected. A suitable addition would be:

# The try list excludes the 5 letters and marks ฅ๎ฃฺฦ, which are not used in
# the orthography of Central Thai.
# The consonants are ordered lower case native, upper case native,
# common Indic, other extras
TRY ก่ข้ค๊ง๋จ็ช์ซะดัตาถิทีนึบืปุผูฝเพแฟโมใยไรลวสหอฉญฮธณภฆฌฎฏฐฑฒฤๅษศฬ

(Note that the file th_TH.aff is in the TIS-620 encoding.)

3. There are further faults, some of which have already been fixed in Hunspell Version 1.3.2, but they are all internal to Hunspell. Unfixed faults are recorded against Hunspell at:

http://sourceforge.net/tracker/?func=detail&aid=3178449&group_id=143754&atid=756395
https://sourceforge.net/tracker/?func=detail&aid=3468022&group_id=143754&atid=756395

Coding to correct them (which may need to be applied more widely) is recorded in
https://sourceforge.net/tracker/?func=detail&aid=3468039&group_id=143754&atid=756395

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: myspell-th 1:3.2.0-3ubuntu3.1 [modified: usr/share/hunspell/th_TH.aff]
ProcVersionSignature: Ubuntu 2.6.32-37.81-generic 2.6.32.49+drm33.21
Uname: Linux 2.6.32-37-generic i686
Architecture: i386
Date: Sat Dec 31 17:58:55 2011
InstallationMedia: Ubuntu 10.04.1 LTS "Lucid Lynx" - Release i386 (20100816.1)
PackageArchitecture: all
ProcEnviron:
 LANGUAGE=en_GB:en
 PATH=(custom, user)
 LANG=en_GB.utf8
 SHELL=/bin/bash
SourcePackage: openoffice.org-dictionaries

Revision history for this message
Richard Wordingham (richard-wordingham) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in openoffice.org-dictionaries (Ubuntu):
status: New → Confirmed
Revision history for this message
Björn Michaelsen (bjoern-michaelsen) wrote :

wontfix in openoffice. If this is still an issue with the new package, please mark this as a bug in the libreoffice-dictionaries source package after confirming.

Changed in openoffice.org-dictionaries (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.