Resolving entities without a DTD

Bug #267825 reported by Kovid Goyal on 2008-09-08
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Wishlist
Unassigned

Bug Description

Hi,

My application needs to process XML files that do not have DTD declarations but that contain entities. I am aware that this is not well formed XML, but nonetheless, I need to be able to process the files. Can I inform XMLParser of the entities somehow? Setting resolve_entities to False doesn't work (still raises an undeclared entity error). Setting recover=True causes the entities to be removed from the tree:

etree.tostring(etree.fromstring('<a>1&my;2</a>', etree.XMLParser(recover=True)))

gives

'<a>12</a>'

etree.LXML_VERSION
(2, 0, 5, 0)

etree.LIBXML_VERSION
(2, 6, 32)

scoder (scoder) wrote :

There isn't currently a way to work around such a broken document.
libxml2 follows the XML spec strictly in that it rejects references to
undeclared entities in the absence of a DTD.

ElementTree lacks DTD support and instead allows you to specify entities
through a parser local "entity" dictionary. lxml could potentially support
a similar interface by intercepting the entity reference resolving at the
SAX layer ("getEntity()" callback function).

Changed in lxml:
importance: Undecided → Wishlist
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions