Comment 2 for bug 1905558

Revision history for this message
Cardinal Kracker (launchpap-user) wrote :

After some more digging I found out that the DTD entity resolution
machanism prefixes the system ID with the path of the parent directory,
whereas parametric or general entites do not get that treatment.

class DTDResolver(etree.Resolver):
  def resolve(self,system_id,public_id,context):
    print( f"*** SYSTEM {system_id} PUBLIC {public_id}" )
    return super().resolve(system_id,public_id,context)

doc = open("rama.xml","rb").read()
parser = etree.XMLParser(dtd_validation=True,load_dtd=True)
parser.resolvers.add( DTDResolver() )
tree = etree.fromstring( doc, parser )

/home/em/Workbench/beautifulsoup> ./dtdbug.py
*** SYSTEM parts.ent PUBLIC None
*** SYSTEM /data/home/em/Workbench/buch.dtd PUBLIC -//Testing//DTD Buch//DE
*** SYSTEM kapitel1.xml PUBLIC None
Traceback (most recent call last):
  File "./dtdbug.py", line 16, in <module>
    tree = etree.fromstring( doc, parser )
  File "src/lxml/etree.pyx", line 3235, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1876, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1764, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1127, in lxml.etree._BaseParser._parseDoc
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
  File "<string>", line 5
lxml.etree.XMLSyntaxError: failed to load external entity "/data/home/em/Workbench/buch.dtd", line 5, column 3
/home/em/Workbench/beautifulsoup>

However at least I can fix that using explicit catalog.xml

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.0//EN"
  "file:///usr/share/xml/schema/xml-core/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <public publicId="-//Testing//DTD Buch//DE" uri="buch.dtd"/>
  <system systemId="parts.ent" uri="parts.ent"/>
  <system systemId="kapitel1.xml" uri="kapitel1.xml"/>
  <system systemId="kapitel2.xml" uri="kapitel2.xml"/>
</catalog>

> XML_CATALOG_FILES=catalog.xml ./dtdbug.py
*** SYSTEM parts.ent PUBLIC None
*** SYSTEM /data/home/em/Workbench/buch.dtd PUBLIC -//Testing//DTD Buch//DE
*** SYSTEM kapitel1.xml PUBLIC None
>

Still gets the wrong system id, but does not throw expections.