Reusing the default XML parser raises an XMLSyntaxError

Bug #1880251 reported by Oleg Hoefling on 2020-05-22
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxml
Medium
scoder

Bug Description

### Version info

Python : sys.version_info(major=3, minor=8, micro=2, releaselevel='final', serial=0)
lxml.etree : (4, 5, 1, 0)
libxml used : (2, 9, 10)
libxml compiled : (2, 9, 10)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)

### Summary

If `libxmlsec1` is initialized, reusing the `etree.XMLParser` raises an `lxml.etree.XMLSyntaxError` when parsing files. To reproduce: install `libxmlsec1` (e.g. `apt install libxmlsec1` or `yum install xmlsec1` etc). Running the script

```python
from lxml import etree
import ctypes

xmlsec = ctypes.CDLL('/usr/lib/libxmlsec1.so')
xmlsec.xmlSecInit()

etree.parse('doc.xml')
etree.parse('doc.xml')
```

will yield

```
Traceback (most recent call last):
  File "reader.py", line 23, in <module>
    etree.parse('doc.xml')
  File "src/lxml/etree.pyx", line 3467, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1839, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1865, in lxml.etree._parseDocumentFromURL
  File "src/lxml/parser.pxi", line 1769, in lxml.etree._parseDocFromFile
  File "src/lxml/parser.pxi", line 1163, in lxml.etree._BaseParser._parseDocFromFile
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 651, in lxml.etree._raiseParseError
  File "b'doc.xml'", line 0
lxml.etree.XMLSyntaxError
```

The workarounds are:

 * resetting the default XML parser object after each `etree.parse` invocation:

   ```python
   etree.parse('doc.xml')
   etree.set_default_parser(parser=etree.XMLParser())
   ```

 * or passing a new `etree.XMLParser` instance explicitly:

   ```python
   etree.parse('doc.xml', parser=etree.XMLParser())
   ```

 * or avoiding passing file name, e.g.

   ```python
   etree.parse(open('doc.xml'))
   ```
   as it seems that only this branch is affected: https://github.com/lxml/lxml/blob/0ce08858a824a0a4fae4102af849a8fbf7bcad6f/src/lxml/parser.pxi#L1837

Oleg Hoefling (hoefling) wrote :

For greater readability: copy the text as-is and paste it in any markdown editor, e.g. https://dillinger.io

scoder (scoder) wrote :

Thanks for the detailed report. This is fixed here:

https://github.com/lxml/lxml/commit/fa1d856cad369d0ac64323ddec14b02281491706

Changed in lxml:
assignee: nobody → scoder (scoder)
importance: Undecided → Medium
status: New → Fix Committed
milestone: none → 4.5.2
Oleg Hoefling (hoefling) wrote :

Hi scoder, thank you for the fast response! The fix does indeed resolve the original example, but if I change the imports order, the error still persists:

```python
import ctypes

xmlsec = ctypes.CDLL('/usr/lib/libxmlsec1.so')
xmlsec.xmlSecInit()

from lxml import etree

etree.parse('doc.xml')
etree.parse('doc.xml')
```

scoder (scoder) on 2020-08-05
Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers