Error in docinfo.URL when spaces in filename
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
New
|
Undecided
|
Unassigned |
Bug Description
##
Python : sys.version_
lxml.etree : (4, 0, 0, 0)
libxml used : (2, 9, 7)
libxml compiled : (2, 9, 7)
libxslt used : (1, 1, 32)
libxslt compiled : (1, 1, 32)
##
When parsing a URL that has spaces, the docinfo.URL is mistakenly escaping the colon in 'file:'.
>>> etree.parse(
'file:/
>>> etree.parse(
'file:/
>>> etree.parse(
'file%3A/
In the last case the space is escaped to %20, but erroneously also the colon is escaped to %3A, rendering the URL useless for resolving relative URLs.
After some more testing, the escaped colon variant seems to work anyway as a base url, also tested with 'uritools'.
It still looks ugly, though...