Wrong ASCII conversion when writing bibtex file

Bug #1832726 reported by Benjamin Hennion on 2019-06-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Undecided
Unassigned

Bug Description

Bug observed on Calibre 3.44 running Arch Linux. I have the same behaviour with Ubuntu LTS.

The letter ñ (LATIN SMALL LETTER N WITH TILDE, U+00F1) is converted into the string "{\\~n}" (with two \, no escape character is used here) by calibre/utils/bibtex.py's utf8ToBibtex function. It should have given the string "{\~n}".

A similar behaviour occurs with other letters (a o i and so on) but other diacritics (acute/diaresis/grave accent) seem to work just fine.

Note that the character ~ TILDE U+007E is converted to "\~". I don't know where and when this happens though.
I guess ñ is first converted to {\~n} by bibtex.py, and then the ~ is converted to \~, giving "{\\~n}".

Simply removing the two \\ from every line involving a tilde in the list in bibtex.py seems to work, but it is not fully satisfactory. We'd need to find a way that does not convert ~ to \~, for instance if one of the fields contains a URL with a ~ in it.

Actually, the second \ is added by escapeSpecialCharacters. I see two problems here:
1- the latex (hence bibtex) escape sequence for ~ is not "\~" but "\char`\~". "\~" adds a tilde on top of the next character.
2- it is to late to make the change, as letters like ñ have already been transformed into "\~n".

The solution is then to
1- modify escapeSpecialCharacters so that it changes ~ into \char`\~
2- call escapeSpecialCharacters before calling resolveUnicode

And here is a patch

Fixed in branch master. The fix will be in the next release. calibre is usually released every alternate Friday.

 status fixreleased

Changed in calibre:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers