Attributes passed to startElement misinterpreted in lxml.sax.ElementTreeContentHandler

Bug #1136509 reported by Mike Bayer on 2013-02-28
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Low
Unassigned

Bug Description

lxml 3.1.0, Python 2.7.3, OSX 10.8:

Python : sys.version_info(major=2, minor=7, micro=3, releaselevel='final', serial=0)
lxml.etree : (2, 3, 1, 0)
libxml used : (2, 7, 8)
libxml compiled : (2, 7, 3)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 24)

the method ElementTreeContentHandler.startElement() passes on the arguments it receives to startElementNS(), adding (None, ) for the namespace being passed. However, if the "attributes" argument is sent, it fails to convert from the Attributes interface to the AttributesNS interface (see http://docs.python.org/2/library/xml.sax.reader.html#attributes-ns-objects), and mis-interprets the string attribute name as a tuple, which it isn't.

Demonstration:

document = """<?xml version="1.0" encoding="utf-8"?>
<SomeDocument xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Data FooBar="123">
    </Data>
</SomeDocument>

"""

from lxml import sax, etree
from xml.sax import parse
from StringIO import StringIO

# syntax is valid, etree parses it
lxml_parsed = etree.parse(StringIO(document))

parse(StringIO(document), sax.ElementTreeContentHandler())

traceback:

Traceback (most recent call last):
  File "test.py", line 22, in <module>
    parse(StringIO(document), sax.ElementTreeContentHandler())
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/__init__.py", line 33, in parse
    parser.parse(source)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/expatreader.py", line 301, in start_element
    self._cont_handler.startElement(name, AttributesImpl(attrs))
  File "/Users/classic/Desktop/tmp/.venv/lib/python2.7/site-packages/lxml/sax.py", line 130, in startElement
    self.startElementNS((None, name), name, attributes)
  File "/Users/classic/Desktop/tmp/.venv/lib/python2.7/site-packages/lxml/sax.py", line 94, in startElementNS
    attr_name = "{%s}%s" % name_tuple
TypeError: not enough arguments for format string

scoder (scoder) wrote :

Hmm, right, that's wrong. Would you care to come up with a fix? You can open a pull request on github for it.

Changed in lxml:
importance: Undecided → Low
status: New → Confirmed
scoder (scoder) wrote :
Changed in lxml:
status: Confirmed → Fix Committed
scoder (scoder) wrote :

Fixed in lxml 3.1.2.

Changed in lxml:
status: Fix Committed → Fix Released
scoder (scoder) on 2013-04-28
Changed in lxml:
milestone: none → 3.1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers