Traversal breaks on empty comment with html5lib

Bug #1798699 reported by Jonas Häggqvist
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Committed
Undecided
Unassigned

Bug Description

The following document produces odd behaviour when parsed with html5lib:

<table>
    <tr>
        <th>
            <!----><div>
                TEST
            </div>
        </th>
    </tr>
</table>

It appears to be related to the comment being empty and not separated from the div tag with any whitespace. If I add content to the comment (even just a single space character) or if I separate the div and comment by whitespace, parsing works as expected.

It also works if I specify lxml as parser.

$ python --version
Python 3.6.7rc1
$ pip freeze |grep -E beautifulsoup4\|html5lib\|lxml
beautifulsoup4==4.6.3
html5lib==1.0.1
lxml==4.2.5

Related branches

Revision history for this message
Jonas Häggqvist (rasher) wrote :
Revision history for this message
Isaac Muse (facelessuser) wrote :

This is related to issue #1806598 which I just looked into. I believe I understand this problem and have a suitable fix for this. I plan on creating a merge request to handle this.

Changed in beautifulsoup:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.