accept Unicode pseudo-URLs and treat as UTF-8
Bug #42514 reported by
Martin Pool
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Fix Released
|
Medium
|
John A Meinel |
Bug Description
URLs are technically just ascii, but it's reasonably common to have encoded Unicode in them. We can translate to and from Unicode pseudo-urls for user input/output, subject to some limitations. Because this is based on some assumptions, we should only use proper URLs in the program and in storage.
See thread
https:/
and replies.
Changed in bzr: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
In more detail:
In places where a URL is expected:
- determine that it's a URL, not a local filename, by looking for "scheme://" for some known scheme
- replace non-ascii characters with their UTF-8 representation
- replace reserved characters with their urlescaped representation
Refinements where this process is performed in each path component are possible but I'm not sure they're necessary.
Special handling may be required for Unicode domain names.