bibtool does not convert accented characters to lowercase
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
bibtool (Ubuntu) |
Expired
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: bibtool
Running on Gutsy Gibbon.
Package version: bibtool 2.48alpha.2-3.1
Steps to reproduce the problem:
1. Take the test.bib file I have attached to this report.
2. Run
$ bibtool -k ./test.bib
(bibtool uses default TeX paths if you do not specify an actual relative or absolute path for your bib file. So the "./" in the above is not optional: it should be adapted to your situation as needed.)
3. The command above should normalize all the keys according to the default "short" key format. Among other things, this means everything should be in lower case in the key. Now, listing only the first line of each entry after running the command above, here is what I would expect in the output:
@Book{ āryadeva.
@Book{ aryadeva.
@Book{ aryadeva.
@Book{ āryadeva.
@Book{ émile.pierre:émile,
4. Actual output, again only listing the first line of each entry:
@Book{ Āryadeva.
@Book{ aryadeva.
@Book{ aryadeva.
@Book{ Āryadeva.
@Book{ Émile.pierre:Émile,
As you can see, accented characters are not converted to lowercase. The first 4 entries are modifications of the entry for an actual book I'm using in a bibliography. I created the fifth entry to illustrate that even with fairly run-of-the-mill diacritics like a simple French e-acute, the problem happens.
Observations: Although I found the bug while generating keys, I think the bug might manifest itself whenever bibtool should convert accented characters to lowercase. That means that if bibtool is used to clean other fields, the problem is likely to occur there too.
The documentation of BibTool states in the section on limitations:
In several modules ASCII encoding is assumed.
This means that BibTool does not know anything about characters with a code point above 127.
I will think whether this limitation can be released -- which means adding support for (many?) encodings.
Gerd (author of BibTool)