en_US dictionary misses n't contractions

Bug #1807103 reported by Luis Marsano
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
scowl (Ubuntu)
New
Undecided
Unassigned

Bug Description

Despite their presence in the `.dic` file, `hunspell` breaks some contractions at ' (ASCII apostrophe) or ’ (Unicode apostrophe) and rejects resulting non-words as misspellings.
```ShellSession
luism@lmm-notebook:~$ lsb_release -rd
Description: Ubuntu 18.10
Release: 18.10
luism@lmm-notebook:~$ apt list --installed hunspell hunspell-en-us
Listing... Done
hunspell-en-us/cosmic,now 1:2018.04.16-1 all [installed]
hunspell/cosmic,now 1.6.2-1build1 amd64 [installed]
luism@lmm-notebook:~$ hunspell -D
SEARCH PATH:
.::/usr/share/hunspell:/usr/share/myspell:/usr/share/myspell/dicts:/Library/Spelling:/home/luism/.openoffice.org/3/user/wordbook:/home/luism/.openoffice.org2/user/wordbook:/home/luism/.openoffice.org2.0/user/wordbook:/home/luism/Library/Spelling:/opt/openoffice.org/basis3.0/share/dict/ooo:/usr/lib/openoffice.org/basis3.0/share/dict/ooo:/opt/openoffice.org2.4/share/dict/ooo:/usr/lib/openoffice.org2.4/share/dict/ooo:/opt/openoffice.org2.3/share/dict/ooo:/usr/lib/openoffice.org2.3/share/dict/ooo:/opt/openoffice.org2.2/share/dict/ooo:/usr/lib/openoffice.org2.2/share/dict/ooo:/opt/openoffice.org2.1/share/dict/ooo:/usr/lib/openoffice.org2.1/share/dict/ooo:/opt/openoffice.org2.0/share/dict/ooo:/usr/lib/openoffice.org2.0/share/dict/ooo
AVAILABLE DICTIONARIES (path is not mandatory for -d option):
/usr/share/hunspell/en_US
LOADED DICTIONARY:
/usr/share/hunspell/en_US.aff
/usr/share/hunspell/en_US.dic
Hunspell 1.6.2
luism@lmm-notebook:~$ for i in are could did is must should was were would
> do sed -ne /${i}n\'t/'{p;q}' /usr/share/hunspell/en_US.dic
> done
aren't
couldn't
didn't
isn't
mustn't
shouldn't
wasn't
weren't
wouldn't
luism@lmm-notebook:~$ for i in are could did is must should was were would
> do hunspell <<EOF
> ${i}n't
> EOF
> done
Hunspell 1.6.2
& aren 12 0: earn, are, arena, Daren, Yaren, Karen, ares, area, amen, wren, Wren, are n
*

Hunspell 1.6.2
& couldn 2 0: could, could n
*

Hunspell 1.6.2
& didn 4 0: did, din, dido, did n
*

Hunspell 1.6.2
& isn 9 0: sin, ins, ism, is, in, inn, ion, isl, is n
*

Hunspell 1.6.2
& mustn 6 0: must, musts, musty, mus tn, mus-tn, must n
*

Hunspell 1.6.2
& shouldn 2 0: should, should n
*

Hunspell 1.6.2
& wasn 10 0: awns, was, wan, swan, wain, warn, wast, wasp, wash, was n
*

Hunspell 1.6.2
& weren 5 0: were, ween, wren, were n, wen
*

Hunspell 1.6.2
& wouldn 3 0: would, woulds, would n
*

luism@lmm-notebook:~$ for i in are could did is must should was were would
> do hunspell <<EOF
> ${i}n’t
> EOF
> done
Hunspell 1.6.2
& aren 12 0: earn, are, arena, Daren, Yaren, Karen, ares, area, amen, wren, Wren, are n
*

Hunspell 1.6.2
& couldn 2 0: could, could n
*

Hunspell 1.6.2
& didn 4 0: did, din, dido, did n
*

Hunspell 1.6.2
& isn 9 0: sin, ins, ism, is, in, inn, ion, isl, is n
*

Hunspell 1.6.2
& mustn 6 0: must, musts, musty, mus tn, mus-tn, must n
*

Hunspell 1.6.2
& shouldn 2 0: should, should n
*

Hunspell 1.6.2
& wasn 10 0: awns, was, wan, swan, wain, warn, wast, wasp, wash, was n
*

Hunspell 1.6.2
& weren 5 0: were, ween, wren, were n, wen
*

Hunspell 1.6.2
& wouldn 3 0: would, woulds, would n
*

```

According to the [`hunspell` changelog](https://github.com/hunspell/hunspell/blob/master/ChangeLog), with appropriate dictionaries, hunspell should accept ' inside words.
> 2014-05-28 Németh László <nemeth at numbertext dot org>:


> * better apostrophe usage:
> - WORDCHARS only with one of the Unicode or ASCII apostrophe
> results extended word tokenization: both of them will be part of
> the words (if they are inside: eg. word's, but not words').
> - convert Unicode apostrophes to ASCII ones for 8-bit dictionaries
> (eg. English dictionaries), or for UTF-8 dictionaries only
> with ASCII apostrophe supports (eg. French dictionaries).

Therefore, I raise the issue here, since dictionary's affix rules don't appear to support the hunspell feature.
The en_US dictionary (and others) should allow hunspell to process words containing ' without breaking them.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.