non ASCII chars in hashtags dont work

Bug #372164 reported by benni on 2009-05-05
120
This bug affects 21 people
Affects Status Importance Assigned to Milestone
Gwibber
Low
Unassigned
gwibber (Ubuntu)
Low
Unassigned

Bug Description

hashtags with non-ascii-chars are not correct searchable. they link only to the part before the non-ascii-char. example: #verdächtig (means "suspect" in german) links to search of #verd.

Colin Dean (colindean) wrote :

I encountered this with the post "#HUHN #FASAN #GRÄUS #chicken #pheasant #grouse".

Seems like the regex used to handle the highlighting needs to be expanded to include all letters and numbers.

It also happens with tilde-accented letters: "õ", "ã".

Ake K. (iake) wrote :

I encountered with Thai language hashtags #ภาษาไทย

Ake K. (iake) wrote :
Omer Akram (om26er) wrote :

Can any one please check if this is still an issue with gwibber 2.29.94?

Changed in gwibber:
status: New → Incomplete
Tim 'avatar' Bartel (avataryt) wrote :

Problem still exists on 2.31.1 - http://twitter.com/avatar/status/11820340965

Omer Akram (om26er) wrote :

Yes I can reproduce this too

Changed in gwibber:
status: Incomplete → Confirmed

Same problem here. Norwegian letters like æ, ø and å in hashtags doesn't work, yet it does on Twitters website.

Omer Akram (om26er) on 2010-05-07
Changed in gwibber (Ubuntu):
importance: Undecided → Low
status: New → Triaged
tags: added: patch
Omer Akram (om26er) wrote :

Ake K, your patch does not work with the latest version things have changed a bit can you please make a new patch?

Nigel Babu (nigelbabu) wrote :

This patch does not apply to the lucid gwibber source. Please see if the patch can be redone for the current sources. I'm marking this 'patch-needswork', once you have another patch for review, feel free to change the tag to 'patch'

tags: added: patch-needswork
removed: patch
Adam Pogany (adam-pogany) wrote :

this one should do the trick against 2.30.0.1 (and afaik current revision too)

tags: added: patch
removed: patch-needswork
David Futcher (bobbo) on 2010-06-09
tags: added: patch-forwarded-upstream

You should probably copy Twitter's own twitter-text library. Their regex is a bit more complicated, and in Ruby, an I am not proficient enough in Python to push a real patch:

LATIN_ACCENTS = [(0xc0..0xd6).to_a, (0xd8..0xf6).to_a, (0xf8..0xff).to_a].flatten.pack('U*').freeze
HASHTAG_CHARACTERS = /[a-z0-9_#{LATIN_ACCENTS}]/io

However, HASHTAG_CHARACTERS are only allowed from position 2 and on:

REGEXEN[:auto_link_hashtags] = /(^|[^0-9A-Z&\/]+)(#|#)([0-9A-Z_]*[A-Z_]+#{HASHTAG_CHARACTERS}*)/io

Omer Akram (om26er) wrote :

the patch in comment #11 works fine. Ken could you please review and merge.

Changed in gwibber:
assignee: nobody → Ken VanDine (ken-vandine)
importance: Undecided → Low
milestone: none → 2.91.1
Ken VanDine (ken-vandine) wrote :

I tested the patch in comment #11, but it didn't seem to work using the original test case. I tried searching for #verdächtig and got no results.

Changed in gwibber:
milestone: 2.91.1 → 2.91.2
Changed in gwibber:
status: Confirmed → Incomplete
assignee: Ken VanDine (ken-vandine) → nobody

Why is this bug "incomplete" ? What information is missing ?

The given patch fixes the bug. Works for me with revision 974.

Xavier Guillot (valeryan-24) wrote :

Problem still present in Gwibber 3.1.4.1 and also with french accents : é è à...

Only letters before accents are taken in the hashtag link.

Bilal Shahid (s9iper1) wrote :

"Thanks for your patch, unfortunately our busy developers haven't been
able to review your patch in a timely manor. The gwibber codebase has
seen significant change and it is likely this patch no longer applies.
Please review it again and if it is still applicable, update it to work
with the latest gwibber trunk. We will be doing a patch review day in
the next few weeks and would like to review your patch. Thanks again for
your contribution!"

tags: added: patch-day-old
Changed in gwibber (Ubuntu):
status: Triaged → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for gwibber (Ubuntu) because there has been no activity for 60 days.]

Changed in gwibber (Ubuntu):
status: Incomplete → Expired

The bug seems corrected (Gwibber 3.4.1 on Precise )

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments