calibre

Wrong ASCII conversion when writing bibtex file

Bug #1832726 reported by Benjamin Hennion on 2019-06-13

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	calibre	Fix Released	Undecided	Unassigned

Bug Description

Bug observed on Calibre 3.44 running Arch Linux. I have the same behaviour with Ubuntu LTS.

The letter ñ (LATIN SMALL LETTER N WITH TILDE, U+00F1) is converted into the string "{\\~n}" (with two \, no escape character is used here) by calibre/utils/bibtex.py's utf8ToBibtex function. It should have given the string "{\~n}".

A similar behaviour occurs with other letters (a o i and so on) but other diacritics (acute/diaresis/grave accent) seem to work just fine.

Note that the character ~ TILDE U+007E is converted to "\~". I don't know where and when this happens though.
I guess ñ is first converted to {\~n} by bibtex.py, and then the ~ is converted to \~, giving "{\\~n}".

Simply removing the two \\ from every line involving a tilde in the list in bibtex.py seems to work, but it is not fully satisfactory. We'd need to find a way that does not convert ~ to \~, for instance if one of the fields contains a URL with a ~ in it.

Revision history for this message

Benjamin Hennion (benjamin-hennion) wrote on 2019-06-13:

Actually, the second \ is added by escapeSpecialCharacters. I see two problems here:
1- the latex (hence bibtex) escape sequence for ~ is not "\~" but "\char`\~". "\~" adds a tilde on top of the next character.
2- it is to late to make the change, as letters like ñ have already been transformed into "\~n".

The solution is then to
1- modify escapeSpecialCharacters so that it changes ~ into \char`\~
2- call escapeSpecialCharacters before calling resolveUnicode

Revision history for this message

Benjamin Hennion (benjamin-hennion) wrote on 2019-06-13: