base_url not set if parent is created in fragment_fromstring()

Bug #1576598 reported by Lukas Anzinger
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Confirmed
Undecided
Unassigned

Bug Description

Hi,

I've stumbled upon the following bug in lxml:

lxml.html.fragment_fromstring() accepts the argument "create_parent" to "[...] encapsulate the HTML in a single element.". However, if "create_parent" is used in conjunction with "base_url" the latter is missing from the returned document:

>>> import lxml.html
>>> lxml.html.fragment_fromstring('<br>', base_url='http://example.com').base_url
'http://example.com'
>>> lxml.html.fragment_fromstring('<br>', create_parent='div', base_url='http://example.com').base_url
>>>

The reason is that fragment_fromstring() creates a new parent element after parsing the string and doesn't set the base_url for the new document root.

Cheers,

Lukas

P.S.: Some version information:

Python : sys.version_info(major=3, minor=5, micro=1, releaselevel='final', serial=0)
lxml.etree : (3, 6, 0, 0)
libxml used : (2, 9, 2)
libxml compiled : (2, 9, 2)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)

summary: - base_url
+ base_url not set if parent is created in fragment_fromstring()
Revision history for this message
Lukas Anzinger (lanzinger) wrote :

Hi,

I wanted to ask if any work (patch, etc.) is appreciated?

Cheers,

Lukas

Revision history for this message
Lukas Anzinger (lanzinger) wrote :

(Work from my side, of course :-) )

Revision history for this message
scoder (scoder) wrote :

Sounds like a bug to me. PR welcome.

Changed in lxml:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.