lxml.html.fragment_fromstring with create_parent fails
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Fix Released
|
Low
|
scoder |
Bug Description
When feeding fragment_fromstring some broken html, it may brake with a ParserError: Multiple elements found.
For example:
s = '<i>This wil</i>
lxml.html.
I expected this to work, and just drop the </div>.
I propose the following (not tested):
in lxml/html/
def fragment_
"""Parses a single HTML element; it is an error if there is more than
one element, or if anything but whitespace precedes or follows the
element.
If create_parent is true (or is a tag name) then a parent node
will be created to encapsulate the HTML in a single element.
"""
if not isinstance(html, _strings):
raise TypeError('string required')
children = fragments_
if not children:
raise etree.ParserErr
if len(children) > 1:
if not create_parent:
raise etree.ParserErr
else:
for element in children:
if isinstance(element, _strings):
result = children[0]
if result.tail and result.
raise etree.ParserErr
result.tail = None
return result
Fixed in trunk rev 71010/71011.
https:/ /codespeak. net/viewvc/ ?view=rev& revision= 71010 /codespeak. net/viewvc/ ?view=rev& revision= 71011
https:/