Comment 28 for bug 1100282

Revision history for this message
Joshua Harlow (harlowja) wrote : Re: DoS through XML entity expansion

So a little of how I traced this down.

  - Figured out what exactly the minidom parseString was doing when it created a parser (when none was provided).
  - This seemed to then go into the code @ http://svn.python.org/view/python/trunk/Lib/xml/dom/minidom.py?revision=75305&view=markup#l1917
  - Note that it seems to jump into 2 different DOM impls depending on if a parser is provided or not so first I tried to see if I could
    monkey-patch out the parser it was 'creating' when no parser was selected, basically by trying to patch out the function @
    http://svn.python.org/view/python/trunk/Lib/xml/dom/expatbuilder.py?revision=50941&view=markup#l932
  - This is how I then noticed that http://svn.python.org/view/python/trunk/Lib/xml/dom/expatbuilder.py?revision=50941&view=markup#l155
    is what is actually creating the underlying parser (so I was trying to then adjust settings in that underlying parser that would
    make it work like we expected). This is where I realized that self._parser.SetParamEntityParsing(expat.XML_PARAM_ENTITY_PARSING_NEVER)
    isn't actually doing anything, I didn't dive to much into the C code to figure out exactly why this call isn't actually changing anything
    but from initial dive I found http://svn.python.org/view/python/trunk/Modules/expat/xmlparse.c?view=markup#l2215 which seems to be the
    entity expansion/reference code, note from that code there is logic around 'XML_ERROR_RECURSIVE_ENTITY_REF;' but this doesn't stop the
    case we are seeing that actually isn't recursive. This code then eventually calls http://svn.python.org/view/python/trunk/Modules/expat/xmlparse.c?view=markup#l4665
    which then starts the whole 'doContent()' function over again.
  - So then I was looking back at that C code @ line http://svn.python.org/view/python/trunk/Modules/expat/xmlparse.c?view=markup#l2225
    and was like it seems to be checking 'else if (defaultHandler)' and then stopping entity expansion right there if said handler actually
    exists, which I was like well thats odd. So then I started seeing about replacing this default handler (which apparently does not exist
    on said parsers unless set). This is how I then started looking at http://svn.python.org/view/python/trunk/Modules/expat/xmlparse.c?revision=77680&view=markup#l1271
    and seeing if I could just set any handler on this parser to stop it from doing what it was doing, so this is how I discovered that setting any
    default handler will cause 'defaultExpandInternalEntities = XML_TRUE;' to be called, which is then how i stumbled upon
    http://svn.python.org/view/python/trunk/Modules/expat/xmlparse.c?view=markup#l2257 and this resulted in me messing with the default handler to see what I could
    set (anything actually) to turn off entity expansion.

End of chapter, josh vs the DTD beast.