html5lib linkage issue
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
During testing, this is an html5lib linkage issue that was found. There is already a merge request open to fix it, but I wanted to provide a simplified recreate to at least detail the problem which will be linked to the merge request.
This is the simple case that breaks:
<div><table id="1">
<div>This tag contains nothing but whitespace: <b> </b></div>
While it links well enough to display:
>>> soup
<html><
>>> soup.b
<b> </b>
The links are not sound.
>>> soup.b.next_element
<table id="1">
The next_element **should** be ' ', the b tag's content. These problems can go unnoticed and usually manifest when performing an extraction that assumes (and frankly requires) good linkage to do things properly.
The merge request at https:/
Related branches
- Leonard Richardson: Pending requested
-
Diff: 361 lines (+267/-47)2 files modifiedbs4/__init__.py (+102/-47)
bs4/testing.py (+165/-0)
Resolved by Isaac's code in revision 483.