Reusing the default XML parser raises an XMLSyntaxError

Bug #1880251 reported by Oleg Hoefling
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Medium
scoder

Bug Description

### Version info

Python : sys.version_info(major=3, minor=8, micro=2, releaselevel='final', serial=0)
lxml.etree : (4, 5, 1, 0)
libxml used : (2, 9, 10)
libxml compiled : (2, 9, 10)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)

### Summary

If `libxmlsec1` is initialized, reusing the `etree.XMLParser` raises an `lxml.etree.XMLSyntaxError` when parsing files. To reproduce: install `libxmlsec1` (e.g. `apt install libxmlsec1` or `yum install xmlsec1` etc). Running the script

```python
from lxml import etree
import ctypes

xmlsec = ctypes.CDLL('/usr/lib/libxmlsec1.so')
xmlsec.xmlSecInit()

etree.parse('doc.xml')
etree.parse('doc.xml')
```

will yield

```
Traceback (most recent call last):
  File "reader.py", line 23, in <module>
    etree.parse('doc.xml')
  File "src/lxml/etree.pyx", line 3467, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1839, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1865, in lxml.etree._parseDocumentFromURL
  File "src/lxml/parser.pxi", line 1769, in lxml.etree._parseDocFromFile
  File "src/lxml/parser.pxi", line 1163, in lxml.etree._BaseParser._parseDocFromFile
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 651, in lxml.etree._raiseParseError
  File "b'doc.xml'", line 0
lxml.etree.XMLSyntaxError
```

The workarounds are:

 * resetting the default XML parser object after each `etree.parse` invocation:

   ```python
   etree.parse('doc.xml')
   etree.set_default_parser(parser=etree.XMLParser())
   ```

 * or passing a new `etree.XMLParser` instance explicitly:

   ```python
   etree.parse('doc.xml', parser=etree.XMLParser())
   ```

 * or avoiding passing file name, e.g.

   ```python
   etree.parse(open('doc.xml'))
   ```
   as it seems that only this branch is affected: https://github.com/lxml/lxml/blob/0ce08858a824a0a4fae4102af849a8fbf7bcad6f/src/lxml/parser.pxi#L1837

Revision history for this message
Oleg Hoefling (hoefling) wrote :

For greater readability: copy the text as-is and paste it in any markdown editor, e.g. https://dillinger.io

Revision history for this message
scoder (scoder) wrote :

Thanks for the detailed report. This is fixed here:

https://github.com/lxml/lxml/commit/fa1d856cad369d0ac64323ddec14b02281491706

Changed in lxml:
assignee: nobody → scoder (scoder)
importance: Undecided → Medium
status: New → Fix Committed
milestone: none → 4.5.2
Revision history for this message
Oleg Hoefling (hoefling) wrote :

Hi scoder, thank you for the fast response! The fix does indeed resolve the original example, but if I change the imports order, the error still persists:

```python
import ctypes

xmlsec = ctypes.CDLL('/usr/lib/libxmlsec1.so')
xmlsec.xmlSecInit()

from lxml import etree

etree.parse('doc.xml')
etree.parse('doc.xml')
```

scoder (scoder)
Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.