Missing tail in iterparse
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Confirmed
|
Medium
|
Unassigned | ||
lxml (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Given a minimal parser (below) and a particular input file (attached), iterparse is not returning the `tail` of the last `<span>` tag.
I am listening for the `end` event, which is the default, instead of the `start` event.
Changing the input, for example by deleting unrelated tags such as the `<link>` tag in the `<head>`, causes the missing text to reappear. This makes it hard to produce a minified input! I was able to remove everything /after/ the element with the missing tail, which doesn't affect the bug, so that is what I attached.
I took the silence on the mailing list to mean that I did not have any obvious problems with the way I was using iterparse. :) https:/
---
```python
#!/usr/bin/env python3
import sys
from lxml import etree
for _, element in etree.iterparse
print((
))
```
Invoke by:
```sh
$ ./bug.py bug.html | grep "splays their blue cards left"
```
Expected output:
```
('span', {'class': 'age e'}, '4', '.\n... Nnastya splays their blue cards left.\n')
```
Actual output: none, and return code 1.
---
Python : sys.version_
lxml.etree : (3, 7, 3, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 3)
libxslt used : (1, 1, 29)
libxslt compiled : (1, 1, 29)
When used with the system python3-lxml package, rather than the version pip installed into a venv:
Python : sys.version_ info(major= 3, minor=5, micro=2, releaselevel= 'final' , serial=0)
lxml.etree : (3, 5, 0, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 2)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)