When using docinfo.internalDTD.iterentities() content is empty

Bug #1839241 reported by KeithSloan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Confirmed
Low
Unassigned

Bug Description

I am trying to process a GDML file i.e. XML and access the !ENTITY definitions.

I can process the file by using resolve_entities=True but my application needs to access the definitions as I wish to make this information editable for saving as a new GDML file.

The start of the GDML file looks like

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE gdml [
 <!ENTITY materials SYSTEM "materialsOptical.xml">
 <!ENTITY solids_Mainz_v2 SYSTEM "solids_Mainz_v2.xml">
 <!ENTITY matrices_Mainz_v2 SYSTEM "matrices_Mainz_v2.xml">
]>

My attempt todate looks like

from lxml import etree

    myparser = etree.XMLParser(resolve_entities=False,attribute_defaults=False)
    tree = etree.parse(filename, parser=myparser)

    print(dir(tree.docinfo))
    print(tree.docinfo.doctype)
    print(dir(tree.docinfo.doctype))

    for e in tree.docinfo.internalDTD.iterentities() :
        print(e.name)
        print(e.content)
        print(e.orig)
        print(dir(e))

Which gets to print out the !ENTITY names materials, solids and matrices but I cannot see how to access their definition i.e. SYSTEM "materialsOptical.xml" etc.

The print of iterentities has
materials
None
None
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'content', 'name', 'orig']

Should the definitions not be available as content?

Thanks

python reportLXML.py
Python : sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0)
lxml.etree : (4, 3, 3, 0)
libxml used : (2, 9, 9)
libxml compiled : (2, 9, 9)
libxslt used : (1, 1, 33)
libxslt compiled : (1, 1, 33)

Revision history for this message
KeithSloan (keilh) wrote :

Tried with lxml 4.4.0 but no change

Revision history for this message
scoder (scoder) wrote :

The relevant code should be this:
https://github.com/lxml/lxml/blob/fd971a56dd5fe68dbafc8048ebaf9d712b2dfc21/src/lxml/dtd.pxi#L240

I think it's worth making the external/system IDs available in the interface here. PR welcome.

Changed in lxml:
importance: Undecided → Low
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.