from lxml import etree
et = etree.parse("article_example.xml", parser=etree.XMLParser(recover=True))
elements = list(et.getroot())
With that doctype declaration, elements[1] is an lxml.etree._Entity object, not an Element. Therefore, elements[1].tag isn't a string (it's a cython function that receives a string and returns another _Entity object like "&your_input_string;"). That's breaking some code that expects that the tag should always be strings and that the iteration (with the Element object or with iterchildren) is just through elements, not entities/text. On the other hand, that doesn't happen if we remove the DOCTYPE line from the input XML.
Is there a way to force a DOCTYPE for parsing, or even to disable it, instead of loading it from the XML?
The following XML file, article_ example. xml:
<?xml version="1.0" encoding="utf-8"?> ing3.dtd" >
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "journalpublish
<article>
Stuff <sup>1</sup> stuff ⋯ stuff <sup>2</sup>
</article>
Was loaded with:
from lxml import etree "article_ example. xml", parser= etree.XMLParser (recover= True))
et = etree.parse(
elements = list(et.getroot())
With that doctype declaration, elements[1] is an lxml.etree._Entity object, not an Element. Therefore, elements[1].tag isn't a string (it's a cython function that receives a string and returns another _Entity object like "&your_ input_string; "). That's breaking some code that expects that the tag should always be strings and that the iteration (with the Element object or with iterchildren) is just through elements, not entities/text. On the other hand, that doesn't happen if we remove the DOCTYPE line from the input XML.
Is there a way to force a DOCTYPE for parsing, or even to disable it, instead of loading it from the XML?
Versions:
Python : sys.version_ info(major= 3, minor=7, micro=3, releaselevel= 'final' , serial=0)
lxml.etree : (4, 3, 3, 0)
libxml used : (2, 9, 9)
libxml compiled : (2, 9, 9)
libxslt used : (1, 1, 33)
libxslt compiled : (1, 1, 33)