lxml

Some text missing when parsing nested nodes

Bug #1942757 reported by qian jia huan on 2021-09-06

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	lxml	Invalid	Undecided	Unassigned

Bug Description

description:
when i parse a html, i found some text missing when node nested node, is it a bug or a normal result?

version info:
Python : sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
lxml.etree : (4, 6, 3, 0)
libxml used : (2, 9, 5)
libxml compiled : (2, 9, 5)
libxslt used : (1, 1, 30)
libxslt compiled : (1, 1, 30)

code:
from lxml import etree
src = '<html><body>test1test2test3test4test5</body></html>'
doc = etree.HTML(src, etree.HTMLParser())
for tag in doc.iter():
 if None==tag.text:
 continue
 print (tag.text)

expect: test1
        test2
        test3
        test4
        test5

output: test1
test2
test4

Revision history for this message

scoder (scoder) wrote on 2021-09-06:

https://lxml.de/tutorial.html#elements-contain-text

Changed in lxml:
status:	New → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.