Attributes passed to startElement misinterpreted in lxml.sax.ElementTreeContentHandler

Bug #1136509 reported by Mike Bayer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Low
Unassigned

Bug Description

lxml 3.1.0, Python 2.7.3, OSX 10.8:

Python : sys.version_info(major=2, minor=7, micro=3, releaselevel='final', serial=0)
lxml.etree : (2, 3, 1, 0)
libxml used : (2, 7, 8)
libxml compiled : (2, 7, 3)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 24)

the method ElementTreeContentHandler.startElement() passes on the arguments it receives to startElementNS(), adding (None, ) for the namespace being passed. However, if the "attributes" argument is sent, it fails to convert from the Attributes interface to the AttributesNS interface (see http://docs.python.org/2/library/xml.sax.reader.html#attributes-ns-objects), and mis-interprets the string attribute name as a tuple, which it isn't.

Demonstration:

document = """<?xml version="1.0" encoding="utf-8"?>
<SomeDocument xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Data FooBar="123">
    </Data>
</SomeDocument>

"""

from lxml import sax, etree
from xml.sax import parse
from StringIO import StringIO

# syntax is valid, etree parses it
lxml_parsed = etree.parse(StringIO(document))

parse(StringIO(document), sax.ElementTreeContentHandler())

traceback:

Traceback (most recent call last):
  File "test.py", line 22, in <module>
    parse(StringIO(document), sax.ElementTreeContentHandler())
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/__init__.py", line 33, in parse
    parser.parse(source)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/expatreader.py", line 301, in start_element
    self._cont_handler.startElement(name, AttributesImpl(attrs))
  File "/Users/classic/Desktop/tmp/.venv/lib/python2.7/site-packages/lxml/sax.py", line 130, in startElement
    self.startElementNS((None, name), name, attributes)
  File "/Users/classic/Desktop/tmp/.venv/lib/python2.7/site-packages/lxml/sax.py", line 94, in startElementNS
    attr_name = "{%s}%s" % name_tuple
TypeError: not enough arguments for format string

Revision history for this message
scoder (scoder) wrote :

Hmm, right, that's wrong. Would you care to come up with a fix? You can open a pull request on github for it.

Changed in lxml:
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
scoder (scoder) wrote :
Changed in lxml:
status: Confirmed → Fix Committed
Revision history for this message
scoder (scoder) wrote :

Fixed in lxml 3.1.2.

Changed in lxml:
status: Fix Committed → Fix Released
scoder (scoder)
Changed in lxml:
milestone: none → 3.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.