Bazaar

accept Unicode pseudo-URLs and treat as UTF-8

Bug #42514 reported by Martin Pool on 2006-05-02

Affects		Status	Importance	Assigned to	Milestone
	Bazaar	Fix Released	Medium	John A Meinel	Bazaar 0.9

Bug Description

URLs are technically just ascii, but it's reasonably common to have encoded Unicode in them. We can translate to and from Unicode pseudo-urls for user input/output, subject to some limitations. Because this is based on some assumptions, we should only use proper URLs in the program and in storage.

See thread

https://lists.ubuntu.com/archives/bazaar-ng/2006q2/011104.html

and replies.

Revision history for this message

Martin Pool (mbp) wrote on 2006-05-02:

In more detail:

In places where a URL is expected:

- determine that it's a URL, not a local filename, by looking for "scheme://" for some known scheme

- replace non-ascii characters with their UTF-8 representation

- replace reserved characters with their urlescaped representation

Refinements where this process is performed in each path component are possible but I'm not sure they're necessary.

Special handling may be required for Unicode domain names.

Revision history for this message

Martin Pool (mbp) wrote on 2006-05-02: pseudocode

  for c in pseudo_url:
    if c in url_safe_characters:
      r += c
    else:
      if isinstance(c, unicode):
        r += urlescape(c.encode('utf-8'))

Revision history for this message

John A Meinel (jameinel) wrote on 2006-06-07:

This should be committed in my encoding branch.

Changed in bzr:
assignee:	nobody → jameinel
status:	Unconfirmed → Fix Committed

John A Meinel (jameinel) on 2006-07-11

Changed in bzr:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.