bibtool does not convert accented characters to lowercase

Bug #209872 reported by Louis-Dominique Dubeau
4
Affects Status Importance Assigned to Milestone
bibtool (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Binary package hint: bibtool

Running on Gutsy Gibbon.

Package version: bibtool 2.48alpha.2-3.1

Steps to reproduce the problem:

1. Take the test.bib file I have attached to this report.

2. Run

$ bibtool -k ./test.bib

(bibtool uses default TeX paths if you do not specify an actual relative or absolute path for your bib file. So the "./" in the above is not optional: it should be adapted to your situation as needed.)

3. The command above should normalize all the keys according to the default "short" key format. Among other things, this means everything should be in lower case in the key. Now, listing only the first line of each entry after running the command above, here is what I would expect in the output:

@Book{ āryadeva.lang:āryadevas,
@Book{ aryadeva.lang:aryadevas,
@Book{ aryadeva.lang:aryadevas*1,
@Book{ āryadeva.lang:āryadevas*1,
@Book{ émile.pierre:émile,

4. Actual output, again only listing the first line of each entry:

@Book{ Āryadeva.lang:Āryadevas,
@Book{ aryadeva.lang:aryadevas,
@Book{ aryadeva.lang:aryadevas*1,
@Book{ Āryadeva.lang:Āryadevas*1,
@Book{ Émile.pierre:Émile,

As you can see, accented characters are not converted to lowercase. The first 4 entries are modifications of the entry for an actual book I'm using in a bibliography. I created the fifth entry to illustrate that even with fairly run-of-the-mill diacritics like a simple French e-acute, the problem happens.

Observations: Although I found the bug while generating keys, I think the bug might manifest itself whenever bibtool should convert accented characters to lowercase. That means that if bibtool is used to clean other fields, the problem is likely to occur there too.

Revision history for this message
Louis-Dominique Dubeau (ldd) wrote :
Revision history for this message
Gerd Neugebauer (gene-gerd-neugebauer) wrote :

The documentation of BibTool states in the section on limitations:

        In several modules ASCII encoding is assumed.

This means that BibTool does not know anything about characters with a code point above 127.

I will think whether this limitation can be released -- which means adding support for (many?) encodings.

Gerd (author of BibTool)

Revision history for this message
Louis-Dominique Dubeau (ldd) wrote :

New development: on Ubuntu 9.04, bibtool will crash while trying to process the file I attached to illustrate the problem.

Right, I missed the part of the documentation which says that bibtool won't work with code points greater than 127 but I'd expect a tool which has this limitation to do something like:

if char > 127, then fail graciously by printing an error message and exiting with a non-zero status

Unfortunately, the current behavior is undetermined.

As for supporting other encodings, I would suggest supporting Unicode (in addition to what is currently supported) but not other encodings: one can always use tools like recode to convert to Unicode, then process the file with bibtool and then use recode again to convert to whatever encoding the user desires.

(BTW, using recode to convert a Unicode BibTeX file to LaTeX encoding (e.g. recode utf8..latex biblio.bib) does not produce good results.)

Revision history for this message
Thibault D (thibdrev) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. We are sorry that we do not always have the capacity to look at all reported bugs in a timely manner. There have been many changes in Ubuntu since that time you reported the bug and your problem may have been fixed with some of the updates. It would help us a lot if you could test it on a currently supported Ubuntu version. When you test it and it is still an issue, kindly upload the updated logs by running apport-collect <bug #> and any other logs that are relevant for this particular issue.

Changed in bibtool (Ubuntu):
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for bibtool (Ubuntu) because there has been no activity for 60 days.]

Changed in bibtool (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.