html5lib tree builder can build a disconnected tree

Bug #1039527 reported by Leonard Richardson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned

Bug Description

Originally reported here: http://stackoverflow.com/questions/12048390/why-would-beautifulsoup-be-returning-nonetypes-when-fed-a-mechanize-response

This only happens using the html5lib treebuilder.

Minimal markup that causes the problem:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
  </head>
  <body>
  </body>
</html>

When immediately preceded by a doctype (or, presumably, another comment, since html5lib turns the doctype into a comment) , the comment is parsed but not connected to the rest of the tree.

The underlying cause is Element.appendChild, which tries to do its own tree maintenance instead of calling BeautifulSoup.object_was_parsed.

Revision history for this message
Leonard Richardson (leonardr) wrote :

"When immediately preceded by a doctype (or, presumably, another comment, since html5lib turns the doctype into a comment) , the comment"

should be

"When immediately preceded by a declaration (or, presumably, a comment, since html5lib turns the declaration into a comment) , the doctype"

Changed in beautifulsoup:
status: New → Fix Committed
Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.