I think the best way to do this is to have PyICU include the Transform classes in their bindings, use those with the following transform: "lower; latin; nfkd" and hand remove anything that isn't a legal username character ([^-_a-zA-Z]). This will remove accents and such composing characters.
This will still need special handling for some characters, including the example given of ø. My testing and a IBM FAQ entry [1] indicate that there are several special cases that normal Unicode transform doesn't do right. So we'll have to hand-transform some things, like ß and æ. Basically anything listed in the IBM article.
BTW, you can play around with Unicode transforms online [2]. It's pretty interesting. For our purposes, using the 'Names' data is particularly relevant.
Unfortunately, the PyICU bindings do *not* have the Transform bits of ICU wrapped yet.
I think the best way to do this is to have PyICU include the Transform classes in their bindings, use those with the following transform: "lower; latin; nfkd" and hand remove anything that isn't a legal username character ([^-_a-zA-Z]). This will remove accents and such composing characters.
This will still need special handling for some characters, including the example given of ø. My testing and a IBM FAQ entry [1] indicate that there are several special cases that normal Unicode transform doesn't do right. So we'll have to hand-transform some things, like ß and æ. Basically anything listed in the IBM article.
BTW, you can play around with Unicode transforms online [2]. It's pretty interesting. For our purposes, using the 'Names' data is particularly relevant.
Unfortunately, the PyICU bindings do *not* have the Transform bits of ICU wrapped yet.
[1] http:// ibm.com/ support/ docview. wss?uid= swg21247569 demo.icu- project. org/icu- bin/translit
[2] http://