On Tue, May 31, 2011 at 21:59, Jaap Karssenberg
<email address hidden> wrote:
> 2011/5/31 Jiří Janoušek <email address hidden>
>
>> I have been doing some experiments and Python regex engine seems to
>> support unicode if unicode arguments and re.U flag are provided (example
>> 3).
>>
>
> Yes it does for \w, however there is no way to match uppercase versus lower
> case (unlike e.g. the perl regex engine which supports matching unicode
> classes).
I see, I missed the point before.
> I have recently been thinking that it can work if we use the string methods
> to determine which characters are uppercase and which are not and find
> camelcase that way looking for an pattern of "upper lower upper" by
> searching character by character.
There are also alternative regex libraries with unicode classes
support [1], but your solution may work well and don't require another
dependency (for one small feature).
> --
> You received this bug notification because you are subscribed to Zim.
> https://bugs.launchpad.net/bugs/518323
>
> Title:
> Automatic link creation and CamelCase don't work with non-latin
> characters
>
> Status in Zim desktop wiki:
> In Progress
>
> Bug description:
> Automatic link creation doesn't work while using accented characters like "á", "é", "í"... inside the link.
> This affects many ways of link creation like links starting with ":", "+", CamelCase links...
>
> Examples:
>
> CamelCase <- creates link
> CámélCase <- doesn't create link
>
> +link <- creates link
> +línk <- doesn't create link
>
> :link <- creates link
> :línk <- doesn't create link
>
> ZIM version: 0.43 Linux
>
On Tue, May 31, 2011 at 21:59, Jaap Karssenberg
<email address hidden> wrote:
> 2011/5/31 Jiří Janoušek <email address hidden>
>
>> I have been doing some experiments and Python regex engine seems to
>> support unicode if unicode arguments and re.U flag are provided (example
>> 3).
>>
>
> Yes it does for \w, however there is no way to match uppercase versus lower
> case (unlike e.g. the perl regex engine which supports matching unicode
> classes).
I see, I missed the point before.
> I have recently been thinking that it can work if we use the string methods
> to determine which characters are uppercase and which are not and find
> camelcase that way looking for an pattern of "upper lower upper" by
> searching character by character.
There are also alternative regex libraries with unicode classes
support [1], but your solution may work well and don't require another
dependency (for one small feature).
[1] http:// stackoverflow. com/questions/ 1832893/ python- regex-matching- unicode- properties/
> -- /bugs.launchpad .net/bugs/ 518323
> You received this bug notification because you are subscribed to Zim.
> https:/
>
> Title:
> Automatic link creation and CamelCase don't work with non-latin
> characters
>
> Status in Zim desktop wiki:
> In Progress
>
> Bug description:
> Automatic link creation doesn't work while using accented characters like "á", "é", "í"... inside the link.
> This affects many ways of link creation like links starting with ":", "+", CamelCase links...
>
> Examples:
>
> CamelCase <- creates link
> CámélCase <- doesn't create link
>
> +link <- creates link
> +línk <- doesn't create link
>
> :link <- creates link
> :línk <- doesn't create link
>
> ZIM version: 0.43 Linux
>