BeautifulSoup._popToTag will pop every tag in the document if given a mismatched end tag
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Code to reproduce:
---
from bs4 import BeautifulSoup
data = """<html>
print(Beautiful
---
Output:
---
<html><
---
The markup '</span>' makes html.parser call BeautifulSoupHT
This is only a problem when using html.parser, since lxml and html5lib know not to treat "</span>" as a real closing tag.
One solution that wouldn't hurt performance much would be to keep a Counter of tag names, and to make _popToTag a no-op if the tag name isn't in the Counter. (It needs to be a Counter so we can keep track of nested tags with the same name while still keeping constant-time lookups.)
Changed in beautifulsoup: | |
status: | Fix Committed → Fix Released |
Revision 579 has a fix.