Traversal breaks on empty comment with html5lib
Bug #1798699 reported by
Jonas Häggqvist
This bug report is a duplicate of:
Bug #1806598: Crash on <!----> comment using html5lib.
Edit
Remove
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Fix Committed
|
Undecided
|
Unassigned |
Bug Description
The following document produces odd behaviour when parsed with html5lib:
<table>
<tr>
<th>
</div>
</th>
</tr>
</table>
It appears to be related to the comment being empty and not separated from the div tag with any whitespace. If I add content to the comment (even just a single space character) or if I separate the div and comment by whitespace, parsing works as expected.
It also works if I specify lxml as parser.
$ python --version
Python 3.6.7rc1
$ pip freeze |grep -E beautifulsoup4\
beautifulsoup4=
html5lib==1.0.1
lxml==4.2.5
Related branches
lp:~facelessuser/beautifulsoup/next_previous_fixes
- Leonard Richardson: Approve
-
Diff: 234 lines (+75/-22)4 files modifiedbs4/__init__.py (+23/-10)
bs4/builder/_html5lib.py (+5/-5)
bs4/element.py (+7/-7)
bs4/tests/test_html5lib.py (+40/-0)
Changed in beautifulsoup: | |
status: | New → Fix Committed |
To post a comment you must log in.
This is related to issue #1806598 which I just looked into. I believe I understand this problem and have a suitable fix for this. I plan on creating a merge request to handle this.