non ASCII chars in hashtags dont work

Bug #372164 reported by benni
120
This bug affects 21 people
Affects Status Importance Assigned to Milestone
Gwibber
Incomplete
Low
Unassigned
gwibber (Ubuntu)
Expired
Low
Unassigned

Bug Description

hashtags with non-ascii-chars are not correct searchable. they link only to the part before the non-ascii-char. example: #verdächtig (means "suspect" in german) links to search of #verd.

Revision history for this message
Colin Dean (colindean) wrote :

I encountered this with the post "#HUHN #FASAN #GRÄUS #chicken #pheasant #grouse".

Seems like the regex used to handle the highlighting needs to be expanded to include all letters and numbers.

Revision history for this message
Miguel Vieira (miguelsvieiracadastros) wrote :

It also happens with tilde-accented letters: "õ", "ã".

Revision history for this message
nanake (nanake) wrote :

I encountered with Thai language hashtags #ภาษาไทย

Revision history for this message
nanake (nanake) wrote :
Revision history for this message
Omer Akram (om26er) wrote :

Can any one please check if this is still an issue with gwibber 2.29.94?

Changed in gwibber:
status: New → Incomplete
Revision history for this message
Tim 'avatar' Bartel (avataryt) wrote :

Problem still exists on 2.31.1 - http://twitter.com/avatar/status/11820340965

Revision history for this message
Omer Akram (om26er) wrote :

Yes I can reproduce this too

Changed in gwibber:
status: Incomplete → Confirmed
Revision history for this message
Alexander Karlstad (alexander.karlstad) wrote :

Same problem here. Norwegian letters like æ, ø and å in hashtags doesn't work, yet it does on Twitters website.

Omer Akram (om26er)
Changed in gwibber (Ubuntu):
importance: Undecided → Low
status: New → Triaged
tags: added: patch
Revision history for this message
Omer Akram (om26er) wrote :

Ake K, your patch does not work with the latest version things have changed a bit can you please make a new patch?

Revision history for this message
Nigel Babu (nigelbabu) wrote :

This patch does not apply to the lucid gwibber source. Please see if the patch can be redone for the current sources. I'm marking this 'patch-needswork', once you have another patch for review, feel free to change the tag to 'patch'

tags: added: patch-needswork
removed: patch
Revision history for this message
Adam Pogany (adam-pogany) wrote :

this one should do the trick against 2.30.0.1 (and afaik current revision too)

tags: added: patch
removed: patch-needswork
David Futcher (bobbo)
tags: added: patch-forwarded-upstream
Revision history for this message
Sigve Indregard (sigve-indregard) wrote :

You should probably copy Twitter's own twitter-text library. Their regex is a bit more complicated, and in Ruby, an I am not proficient enough in Python to push a real patch:

LATIN_ACCENTS = [(0xc0..0xd6).to_a, (0xd8..0xf6).to_a, (0xf8..0xff).to_a].flatten.pack('U*').freeze
HASHTAG_CHARACTERS = /[a-z0-9_#{LATIN_ACCENTS}]/io

However, HASHTAG_CHARACTERS are only allowed from position 2 and on:

REGEXEN[:auto_link_hashtags] = /(^|[^0-9A-Z&\/]+)(#|#)([0-9A-Z_]*[A-Z_]+#{HASHTAG_CHARACTERS}*)/io

Revision history for this message
Omer Akram (om26er) wrote :

the patch in comment #11 works fine. Ken could you please review and merge.

Changed in gwibber:
assignee: nobody → Ken VanDine (ken-vandine)
importance: Undecided → Low
milestone: none → 2.91.1
Revision history for this message
Ken VanDine (ken-vandine) wrote :

I tested the patch in comment #11, but it didn't seem to work using the original test case. I tried searching for #verdächtig and got no results.

Changed in gwibber:
milestone: 2.91.1 → 2.91.2
Changed in gwibber:
status: Confirmed → Incomplete
assignee: Ken VanDine (ken-vandine) → nobody
Revision history for this message
Frédéric Grosshans (fgrosshans) wrote :

Why is this bug "incomplete" ? What information is missing ?

Revision history for this message
Nils Dabrock (nils-nilsdabrock) wrote :

The given patch fixes the bug. Works for me with revision 974.

Revision history for this message
Xavier Guillot (valeryan-24) wrote :

Problem still present in Gwibber 3.1.4.1 and also with french accents : é è à...

Only letters before accents are taken in the hashtag link.

Revision history for this message
Bilal Shahid (s9iper1) wrote :

"Thanks for your patch, unfortunately our busy developers haven't been
able to review your patch in a timely manor. The gwibber codebase has
seen significant change and it is likely this patch no longer applies.
Please review it again and if it is still applicable, update it to work
with the latest gwibber trunk. We will be doing a patch review day in
the next few weeks and would like to review your patch. Thanks again for
your contribution!"

tags: added: patch-day-old
Changed in gwibber (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for gwibber (Ubuntu) because there has been no activity for 60 days.]

Changed in gwibber (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Frédéric Grosshans (fgrosshans) wrote :

The bug seems corrected (Gwibber 3.4.1 on Precise )

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.