Comment 41 for bug 1100282

Revision history for this message
Christian Heimes (heimes) wrote : Re: DoS through XML entity expansion

I compared your patches with my patch for Python's standard library. Your approach seems to work well but I'm not sure if it intercepts all possible attacks. When you overwrite DefaultHandler you don't get any error messages when an entity occurs in the XML document.

I like to suggest a different approach. You can either reject all XML documents with an inline DTD (SOAP does it) or you can disallow entity declarations:

class ProtectedExpatParser(ExpatParser):
    def __init__(self, forbid_dtd=True, forbid_entities=True,
                 *args, **kwargs):
        # Python 2.x old style class
        ExpatParser.__init__(self, *args, **kwargs)
        self.forbid_dtd = forbid_dtd
        self.forbid_entities = forbid_entities

    def start_doctype_decl(self, name, sysid, pubid, has_internal_subset):
        raise ValueError("Inline DTD forbidden")

    def entity_decl(self, entityName, is_parameter_entity, value, base, systemId, publicId, notationName):
        raise ValueError("<!ENTITY> forbidden")

    def unparsed_entity_decl(self, name, base, sysid, pubid, notation_name):
        # expat 1.2
        raise ValueError("<!ENTITY> forbidden")

    def reset(self):
        ExpatParser.reset(self)
        if self.forbid_dtd:
            self._parser.StartDoctypeDeclHandler = self.start_doctype_decl
        if self.forbid_entities:
            self._parser.EntityDeclHandler = self.entity_decl
            self._parser.UnparsedEntityDeclHandler = self.unparsed_entity_decl

With this parser errors and attacks aren't ignored silently.

Your fix for LXML may contain a flaw, too. LXML used to have an issue when parser is shared by multiple threads. I don't know if your are using threads nor if LXML still has a problem with shared parser instances. At work we are using a thread local storage to have a parser context for each thread.