importing lxml.etree changes what exceptions are thrown by drv_libxml2

Bug #1001301 reported by Kurt McKee on 2012-05-18
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

Python : sys.version_info(major=2, minor=7, micro=2, releaselevel='final', serial=0)
lxml.etree : (2, 3, 4, 0)
libxml used : (2, 7, 7)
libxml compiled : (2, 7, 8)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)

This bug is being tracked on two bug trackers:
FlexGet #1446 <>
feedparser #352 <>

While investigating the tickets I found that, while libxml2 will normally throw a SAXParseException when it encounters an illformed character reference, merely importing lxml.etree causes libxml2 to throw a SAXException instead. I've created and attached a simple script to demonstrate the issue. The output from the script on my machine is:

Run #1 correct: SAXParseException was thrown
Run #2 INCORRECT: SAXException was thrown

scoder (scoder) wrote :

lxml 2.4 (the latest github master) is going to fix at least some of the problems when using it together with libxml2's own Python bindings. I would be happy about some testing in that regard.

In general, it's safest to use a statically linked installation if both are going to be used together.

Changed in lxml:
importance: Undecided → Low
status: New → Triaged
Kurt McKee (kurtmckee) wrote :

I don't think I understand. Are you asking me to download and compile lxml from github? Were you able to reproduce the error using the sample script I uploaded?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers