nonlatin1 locale: garbage in imported bibtex data

Bug #223763 reported by Sergey B Kirpichev
2
Affects Status Importance Assigned to Milestone
Referencer
Fix Released
Undecided
Unassigned

Bug Description

$ echo $LANG
ru_RU.UTF-8

It's impossible to import BibTeX data in this locale. For example, "Paste BibTeX" operation just throw an exception "Invalid byte sequence in conversion input" and leaves the imported data in an unreadable state...

Attached patch fixes problem for me, but I guess, that there is a better solution. I.e. convert bibtex data from current locale for clipboard...

Revision history for this message
Sergey B Kirpichev (skirpichev) wrote :
Revision history for this message
John S (jcspray) wrote :

I would like to understand why this is failing. According to the GTK documentation, Gtk::Clipboard::wait_for_text should return a UTF-8 encoded string, so the explicit conversion in RefWindow::onPasteBibtex should never fail.

Which exception is getting thrown, ie what follows "The operation underway was" in the exception dialog?

Revision history for this message
Sergey B Kirpichev (skirpichev) wrote :

Well, you try to convert from utf8 in RefWindow.C:
latintext = Glib::convert (clipboardtext, "iso-8859-1", "UTF8"); // it throws annoying exception
and then:
library_->doclist_->import (latintext, BibUtils::FORMAT_BIBTEX);

But the last call bibl_initparams (), where for the readmode FORMAT_BIBTEX by default:
p->utf8in = 0;
p->charsetin = BIBL_CHARSET_DEFAULT; // it's latin1

Revision history for this message
John S (jcspray) wrote :

Can you please put in a "std::cerr << clipboardtext.validate()" in RefWindow::onPasteBibtex and then paste something that causes the exception to be thrown? If that prints 0 then wait_for_text isn't giving us valid utf-8 and it's a gtk bug which needs investigating.

Don't worry, I will get this sorted out, but I want to fully understand what's going on first.

Revision history for this message
Sergey B Kirpichev (skirpichev) wrote : Re: [Bug 223763] Re: nonlatin1 locale: garbage in imported bibtex data

2008/4/29 John Spray <email address hidden>:
> Can you please put in a "std::cerr << clipboardtext.validate()" in
> RefWindow::onPasteBibtex and then paste something that causes the
> exception to be thrown? If that prints 0 then wait_for_text isn't
> giving us valid utf-8 and it's a gtk bug which needs investigating.

 The example of citation:
 http://ufn.ru/ru/articles/2008/3/d/citation/ru/bibtex.html#citation

 Documents->Add Empty Reference->Paste Bibtex

 The output:
 [skip]
 Plugin::load: successfully loaded 'pubmed'
 disabling plugin lyx
 Reading XML...
 Done, got 0 docs
 RefWindow::run: entering main loop
 clipboardtext.validate(): 1
 Publisher = Ð£Ñ Ð¿ÐµÑ…Ðž Ñ„ÐžÐ·ÐžÑ‡ÐµÑ ÐºÐžÑ… Маук(1)
 Eid = 301(0)
 Url = http://ufn.ru/ru/articles/2008/3/d/(0)
 Doi = 10.3367/UFNr.0178.200803d.0301(0)
 DocumentProperties::onPasteBibtex: Imported 1 references

Conversion from utf to latin1 fails, but then you import the latintext
as latin1. See garbage in the field "Publisher".

Revision history for this message
John S (jcspray) wrote :

Okay, I get it now. What threw me off was the idea that it was impossible to import any bibtex, when in fact it was only for certain inputs.

Your patch is fine for the pasting, but a little bit more attention will be needed for importing from a file. Currently imported files are assumed to be latin1, this is something that should be auto-detected at import time. I'm sure there is some nice library call for doing this but I don't know off the top of my head what it is...

Revision history for this message
John S (jcspray) wrote :

I've applied your patch, and also modified DocumentList::importFromFile to check the encoding of bibtex files and convert appropriately. It only supports utf8 and latin1 at the moment.

Changed in referencer:
status: New → Fix Committed
Changed in referencer:
milestone: none → 1.2.0
Changed in referencer:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.