lxml.html.fragment_fromstring() strips an enclosing body when present
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Triaged
|
Undecided
|
Unassigned |
Bug Description
There seems to be an undocumented inconsistency / asymmetry with lxml.html's fragment_
Specifically, it looks like fragment_
You can observe this as follows:
from lxml.html import fragment_fromstring
def parse(html):
element = fragment_
# Outputs:
# <Element p at 0x10cadf9f8> [<Element i at 0x10cacc3b8>]
# <Element p at 0x10cadfb38> [<Element i at 0x10cadf9f8>]
parse(
parse(
It seems like this could be an issue with lxml because of the body manipulation it does inside fragments_
https:/
Here is the requested information about my system (on Mac OS X):
Python : sys.version_
lxml.etree : (3, 6, 4, 0)
libxml used : (2, 9, 2)
libxml compiled : (2, 9, 2)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)
Could be considered a bug, but it might not be all that easy to fix, given the way fragment_ fromstring( ) is supposed to work.
That leaves me torn whether this should be fixed. I think I would accept a pull request with a reasonable and properly tested solution.