Zim

Comment 15 for bug 518323

Revision history for this message
Jiří Janoušek (fenryxo) wrote : Re: [Bug 518323] Re: Automatic link creation and CamelCase don't work with non-latin characters

On Tue, May 31, 2011 at 21:59, Jaap Karssenberg
<email address hidden> wrote:
> 2011/5/31 Jiří Janoušek <email address hidden>
>
>> I have been doing some experiments and Python regex engine seems to
>> support unicode if unicode arguments and re.U flag are provided (example
>> 3).
>>
>
> Yes it does for \w, however there is no way to match uppercase versus lower
> case (unlike e.g. the perl regex engine which supports matching unicode
> classes).

I see, I missed the point before.

> I have recently been thinking that it can work if we use the string methods
> to determine which characters are uppercase and which are not and find
> camelcase that way looking for an pattern of "upper lower upper" by
> searching character by character.

There are also alternative regex libraries with unicode classes
support [1], but your solution may work well and don't require another
dependency (for one small feature).

[1] http://stackoverflow.com/questions/1832893/python-regex-matching-unicode-properties/

> --
> You received this bug notification because you are subscribed to Zim.
> https://bugs.launchpad.net/bugs/518323
>
> Title:
>  Automatic link creation and CamelCase don't work with non-latin
>  characters
>
> Status in Zim desktop wiki:
>  In Progress
>
> Bug description:
>  Automatic link creation doesn't work while using accented characters like "á", "é", "í"... inside the link.
>  This affects many ways of link creation like links starting with  ":", "+", CamelCase links...
>
>  Examples:
>
>  CamelCase <- creates link
>  CámélCase <- doesn't create link
>
>  +link <- creates link
>  +línk <- doesn't create link
>
>  :link <- creates link
>  :línk <- doesn't create link
>
>  ZIM version: 0.43 Linux
>