Error in docinfo.URL when spaces in filename

Bug #1879866 reported by Per-Åke Ling
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
New
Undecided
Unassigned

Bug Description

##
Python : sys.version_info(major=3, minor=6, micro=10, releaselevel='final', serial=0)
lxml.etree : (4, 0, 0, 0)
libxml used : (2, 9, 7)
libxml compiled : (2, 9, 7)
libxslt used : (1, 1, 32)
libxslt compiled : (1, 1, 32)
##

When parsing a URL that has spaces, the docinfo.URL is mistakenly escaping the colon in 'file:'.

>>> etree.parse('file:/home/plg/relaxng.rng').docinfo.URL # no spaces
'file:/home/plg/relaxng.rng' # <- CORRECT

>>> etree.parse('file:/home/plg/relax%20ng.rng').docinfo.URL # escaped space
'file:/home/plg/relax%20ng.rng' # <- CORRECT

>>> etree.parse('file:/home/plg/relax ng.rng').docinfo.URL # unescaped space
'file%3A/home/plg/relax%20ng.rng' # <- ERROR

In the last case the space is escaped to %20, but erroneously also the colon is escaped to %3A, rendering the URL useless for resolving relative URLs.

Revision history for this message
Per-Åke Ling (rlgson) wrote :

After some more testing, the escaped colon variant seems to work anyway as a base url, also tested with 'uritools'.

It still looks ugly, though...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.