Resolving entities without a DTD

Bug #267825 reported by Kovid Goyal
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Confirmed
Wishlist
Unassigned

Bug Description

Hi,

My application needs to process XML files that do not have DTD declarations but that contain entities. I am aware that this is not well formed XML, but nonetheless, I need to be able to process the files. Can I inform XMLParser of the entities somehow? Setting resolve_entities to False doesn't work (still raises an undeclared entity error). Setting recover=True causes the entities to be removed from the tree:

etree.tostring(etree.fromstring('<a>1&my;2</a>', etree.XMLParser(recover=True)))

gives

'<a>12</a>'

etree.LXML_VERSION
(2, 0, 5, 0)

etree.LIBXML_VERSION
(2, 6, 32)

Revision history for this message
scoder (scoder) wrote :

There isn't currently a way to work around such a broken document.
libxml2 follows the XML spec strictly in that it rejects references to
undeclared entities in the absence of a DTD.

ElementTree lacks DTD support and instead allows you to specify entities
through a parser local "entity" dictionary. lxml could potentially support
a similar interface by intercepting the entity reference resolving at the
SAX layer ("getEntity()" callback function).

Changed in lxml:
importance: Undecided → Wishlist
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.