Problem with unicode in oaipmh

Bug #617439 reported by Benno Luthiger
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Silva OAI
Fix Released
Undecided
Unassigned

Bug Description

I stumbled over a unicode problem in the oaipmh.client module.

The OAI PMH repository I'd like to harvest returns setNames containing unicodes when I request their ListSets (requestUrl?verb=ListSets). Unfortunately, the module returns lxml.etree._ElementUnicodeResult object when it processes text with Umlauts. Later then, the program tries to store the listSets by pickling the entries and crashes.

To work around this problem, the _ElementUnicodeResult objects have to be converted to proper unicode objects.
The following code can achieve this:
In oaipmh.client.buildSets() [line 260/261]
==
            setSpec = e('string(oai:setSpec/text())')
            setName = u'%s' %e('string(oai:setName/text())') # e('string(oai:setName/text())') may return _ElementUnicodeResult
==

May be there's a better solution.
Can you fix the problem and prepare a new release?
Benno

Revision history for this message
Benno Luthiger (benno-luthiger) wrote :

The clean solution, of course, is the following:
==
            # make sure we get back unicode strings instead
            # of lxml.etree._ElementUnicodeResult objects.
            setSpec = unicode(e('string(oai:setSpec/text())'))
            setName = unicode(e('string(oai:setName/text())'))
==
I have fixed that bug in client.py and made a check in (r44465).

Benno

Revision history for this message
Jasper Op de Coul (jasper-infrae) wrote :

Hi Benno,

Thanks for the unicode fixes you committed.
I already added those protections in other parts of the library but apparently not in the client code.

I have send a question to the lxml project about this. Especially in the Zope world there is a big
chance that these values get pickled at some point, and then you end up with these problems.

Maybe they will fix it in the lxml codebase so we can remove these conversions.
I released pyoai 2.3.4 today which contains your patch.

Kind regards,
Jasper Op de Coul

Changed in silva-oai:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.