Problems with hyphen in spell checking

Bug #1656319 reported by leastcommonancestor on 2017-01-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Undecided
Unassigned

Bug Description

Some types of hyphen are treated different in spell checking:

hyphen-minus (U+002D) works as expected (e.g. "one-half" is ok).
hyphen (U+2010) is shown as spelling error.
non-breaking-hyphen (U+2011) again is ok.
soft-hyphen (U+00AD) entity input is automatically converted, i.e. it is no longer visible, e.g. "ar­tistic" becomes "artistic", appears correct, but is shown as an spelling error.

The combination hyphen-minus + soft-hyphen (does not make much sense, but appeared in text pasted from a PDF) is also shown as an error. However, on right-click, the context menu items for spell checking are not shown.

hyphen-minus, hyphen and non-breaking hyphen should be treated consistently. The inconsistent handling of hyphen is a problem, since for good typography hyphen should be used instead og hyphen-minus.
soft-hyphen should be ignored by the spell-checker.

A way to make soft-hyphens visible would be useful. If there is, I did not find it.

Environment: Calibre 2.77 with Ubuntu 12.04 on 64-bit system.

The treatment of hyphens comes from the ICU library. If you disagree
with the rules ICU uses you should ask them to change it. As for making
soft-hyphens visible, simply use a search and replace to replace them
with some visible character, perform whatever operations you want and
then search and replace the character back with a soft-hyphen. Trying to
make the text editing widget display invisible characters, is waaay too
much work, as it requires changes to Qt code.

 status wontfix

Changed in calibre:
status: New → Won't Fix
Kovid Goyal (kovid) wrote :

Actually on second thoughts, I can work-around the ICU behavior fairly easily.

Changed in calibre:
status: Won't Fix → New

Fixed in branch master. The fix will be in the next release. calibre is usually released every Friday.

 status fixreleased

Changed in calibre:
status: New → Fix Released

Hello Mr. Goyal!

Thank you for your quick response.
However, the problem is not clear to me.
I take it that Calibre is using the ICU C-Library to normalize words
prior to lookup.
But the normalization chart for Punctuation-Dash
<http://www.unicode.org/charts/normalization/> suggests, that U+2011
under compatibility normalization would be changed to U+2010, so in both
cases the spell checker lookup should fail.
I'm quite willing to file a bug report at
http://bugs.icu-project.org/trac/, but for now, I do not think I have
enough information to specify what happens.
Of course, if the library changes U+2011 to U+002D (hyphen-minus) and
leaves U+2010 unchanged, this would be a bug.

Greetings + thanks again for your – minor problems aside – superb software

LCA

On 14.01.2017 04:09, Kovid Goyal wrote:
> The treatment of hyphens comes from the ICU library. If you disagree
> with the rules ICU uses you should ask them to change it. As for making
> soft-hyphens visible, simply use a search and replace to replace them
> with some visible character, perform whatever operations you want and
> then search and replace the character back with a soft-hyphen. Trying to
> make the text editing widget display invisible characters, is waaay too
> much work, as it requires changes to Qt code.
>
> status wontfix
>
> ** Changed in: calibre
> Status: New => Won't Fix
>

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers