Zim

Bug #518323
Comment #15

Comment 15 for bug 518323

Revision history for this message

Jiří Janoušek (fenryxo) wrote on 2011-05-31: Re: [Bug 518323] Re: Automatic link creation and CamelCase don't work with non-latin characters

#15

On Tue, May 31, 2011 at 21:59, Jaap Karssenberg
<email address hidden> wrote:
> 2011/5/31 Jiří Janoušek <email address hidden>
>
>> I have been doing some experiments and Python regex engine seems to
>> support unicode if unicode arguments and re.U flag are provided (example
>> 3).
>>
>
> Yes it does for \w, however there is no way to match uppercase versus lower
> case (unlike e.g. the perl regex engine which supports matching unicode
> classes).

I see, I missed the point before.

> I have recently been thinking that it can work if we use the string methods
> to determine which characters are uppercase and which are not and find
> camelcase that way looking for an pattern of "upper lower upper" by
> searching character by character.

There are also alternative regex libraries with unicode classes
support [1], but your solution may work well and don't require another
dependency (for one small feature).

[1] http://stackoverflow.com/questions/1832893/python-regex-matching-unicode-properties/

> --
> You received this bug notification because you are subscribed to Zim.
> https://bugs.launchpad.net/bugs/518323
>
> Title:
> Automatic link creation and CamelCase don't work with non-latin
> characters
>
> Status in Zim desktop wiki:
> In Progress
>
> Bug description:
> Automatic link creation doesn't work while using accented characters like "á", "é", "í"... inside the link.
> This affects many ways of link creation like links starting with ":", "+", CamelCase links...
>
> Examples:
>
> CamelCase <- creates link
> CámélCase <- doesn't create link
>
> +link <- creates link
> +línk <- doesn't create link
>
> :link <- creates link
> :línk <- doesn't create link
>
> ZIM version: 0.43 Linux
>