iter(tag) finds nothing when feed parser HTMLParser() used
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Fix Released
|
Medium
|
scoder |
Bug Description
Using the iter() method with the "tag" parameter specified yields no elements when I created the tree using the HTMLParser() feed parser interface, however it did yield the expected element when I used the HTML() function, or the parse() function. Looks like I can still use the xpath() method as a workaround.
Test program:
def test(root):
print("Root element:", root)
i = root.iter("body")
print('List of <body> elements via iter("body"):', list(i))
print("List of all elements via iter():", list(root.iter()))
print(
markup = "<html>
print("==== Using string parser HTML()")
from lxml.etree import HTML
root = HTML(markup)
test(root)
print("==== Using feed parser HTMLParser()")
from lxml.etree import HTMLParser
parser = HTMLParser()
parser.feed(markup)
root = parser.close()
test(root)
Output:
==== Using string parser HTML()
Root element: <Element html at 0x7f65e9edfe60>
List of <body> elements via iter("body"): [<Element body at 0x7f65e9edfeb0>]
List of all elements via iter(): [<Element html at 0x7f65e9edfe60>, <Element body at 0x7f65e9edfeb0>]
xpath("//body"): [<Element body at 0x7f65e9edff00>]
==== Using feed parser HTMLParser()
Root element: <Element html at 0x7f65e9edfeb0>
List of <body> elements via iter("body"): []
List of all elements via iter(): [<Element html at 0x7f65e9edfeb0>, <Element body at 0x7f65e9edff50>]
xpath("//body"): [<Element body at 0x7f65e9edfe60>]
Arch Linux:
Python 3.2.3, releaselevel=
lxml.etree (2, 3, 4, 0)
libxml2 2.7.8-1, libxslt 1.1.26-2
Also happens with Python 2.7.3 (same lxml.etree version) on Arch Linux
Originally happened with Python 3.2.3, lxml.etree (2, 3, 2, 0) on Ubuntu
Thanks for the report, I can reproduce this.