Some text missing when parsing nested nodes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Invalid
|
Undecided
|
Unassigned |
Bug Description
description:
when i parse a html, i found some text missing when node nested node, is it a bug or a normal result?
version info:
Python : sys.version_
lxml.etree : (4, 6, 3, 0)
libxml used : (2, 9, 5)
libxml compiled : (2, 9, 5)
libxslt used : (1, 1, 30)
libxslt compiled : (1, 1, 30)
code:
from lxml import etree
src = '<html>
doc = etree.HTML(src, etree.HTMLParser())
for tag in doc.iter():
if None==tag.text:
continue
print (tag.text)
expect: test1
test2
test3
test4
test5
output: test1
test2
test4
https:/ /lxml.de/ tutorial. html#elements- contain- text