'<' character causes incorrect parsing
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Invalid
|
Undecided
|
Unassigned |
Bug Description
I've noticed different parsing behaviors on two installations of lxml
When parsing an html document containing a '<' character, lxml removes it and all text content after it
EG:
```
from lxml import html
d = """<html>
<body>
10 < 1000
</body>
</html>"""
print(html.
---
<html>
<body>
10
</body></html>
```
Python : sys.version_
lxml.etree : (4, 6, 5, 0)
libxml used : (2, 9, 13)
libxml compiled : (2, 9, 13)
libxslt used : (1, 1, 35)
libxslt compiled : (1, 1, 35)
This issue is not present on
Python : sys.version_
lxml.etree : (4, 6, 5, 0)
libxml used : (2, 9, 10)
libxml compiled : (2, 9, 10)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)
```
from lxml import html
d = """<html>
<body>
10 < 1000
</body>
</html>"""
print(html.
---
<html>
<body>
10 < 1000
</body>
</html>
```
Works for me with libxml2 2.12.3.