Character reference handling for <, &, and > doesn't match notification-daemon

Bug #356659 reported by Matthew Paul Thomas
20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
notify-osd (Ubuntu)
Fix Released
Low
Cody Russell

Bug Description

Problem occurs with: notify-osd 0.9.9-0ubuntu1, Ubuntu Jaunty
Does not occur with: notification-daemon 0.4.0-0ubuntu3, Ubuntu Jaunty

1. notify-send "test" "&lt; &#60; &#x3C; &amp; &#38; &#x26; &gt; &#62; &#x3e;"

What should happen: A bubble appears with title "test" and body text "< < < & & & > > >".
What actually happens: A bubble appears with title "test" and body text "< &#60; &#x3C; & &#38; &#x26; > &#62; &#x3e;".

<https://wiki.ubuntu.com/NotifyOSD#sanitizing>: "2. If the text contains a “&” character, and a “;” character occurs later in the string, the sequence from that “&” character to the soonest-following “;” character should be presented as follows:..."

description: updated
summary: - Character reference handling for <, &, and > don't match notification-
+ Character reference handling for <, &, and > doesn't match notification-
daemon
Revision history for this message
David Barth (dbarth) wrote :

Cody, can you take a look at this one? Thanks

Changed in notify-osd (Ubuntu):
assignee: nobody → bratsche
Mirco Müller (macslow)
Changed in notify-osd (Ubuntu):
assignee: bratsche → nobody
status: New → Confirmed
David Barth (dbarth)
Changed in notify-osd (Ubuntu):
importance: Undecided → Low
assignee: nobody → bratsche
Revision history for this message
Shane O'Connell (shaneoc) wrote :

I've just noticed this as well, I have people with an apostrophe in their last name on my list and when they sign online the notification shows it as "&apos;"

Revision history for this message
Shane O'Connell (shaneoc) wrote :

&apos, by the way, is not listed on that wiki page. There could be other things missing as well.

Revision history for this message
Matthew Paul Thomas (mpt) wrote :

Shane, what IM program is request the notification with "&apos;" in it?

Revision history for this message
Shane O'Connell (shaneoc) wrote :

I'm using pidgin.

Revision history for this message
Cody Russell (bratsche) wrote :

Is that happening in the notification title or the body? If it's the title, I think we already have a patch for pidgin-libnotify that will fix that.

Revision history for this message
Cody Russell (bratsche) wrote :
Cody Russell (bratsche)
Changed in notify-osd (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
Matthew Paul Thomas (mpt) wrote :

Cody, have you tested the patch against the test I provided? I don't see how it fixes the numeric references.

Why are you special-casing &quot; and &apos;? Do you have any evidence that applications escape " as &quot; or ' as &apos; when sending notifications? If so, let me know, and I'll add it to the spec. If not, we shouldn't render &quot; as " or &apos; as ', even though notification-daemon does, because it will mean more unwanted conversions for applications sending unescaped text. (We don't render &#164; as ¤ even though notification-daemon does, for example.) Either way, the implementation and the spec should be in sync.

Revision history for this message
Cody Russell (bratsche) wrote :
Revision history for this message
Cody Russell (bratsche) wrote :

I removed the "lone ampersand" fixup stuff. I think that was left over from when we were using Pango markup escaping, because it could confuse things. Note that bubble.c no longer uses pango_parse_markup(), it only uses filter_text(). This fixes a lot of things "such as <>" being interpreted when it should, "x < 1" being interpreted when it shouldn't, etc. It also is important from the unit test perspective, because now the unit test actually tests -exactly- what will be filtered by notify-osd (whereas previously it only tested what would be filtered by filter_text(), and then Pango would go and screw things up after that).

This also adds new regexes for the numerics, which I had forgotten to include in the last patch.

Hopefully the unit tests cover things pretty well and will make this easier for Mirco to integrate.

Revision history for this message
Cody Russell (bratsche) wrote :

Same as last patch, just fixes indenting.

Revision history for this message
Cody Russell (bratsche) wrote :
Cody Russell (bratsche)
Changed in notify-osd (Ubuntu):
status: In Progress → Confirmed
Revision history for this message
Cody Russell (bratsche) wrote :
Revision history for this message
Cody Russell (bratsche) wrote :

Add unit tests for &apos; and &quot;

Revision history for this message
Matthew Paul Thomas (mpt) wrote :

Thanks for working on this, Cody.

    +#define CHARACTER_LT_REGEX "&(lt;|#60;|#x3C;)"

This should be "&(lt;|#60;|#x3C;|#x3c;)"

    +#define CHARACTER_GT_REGEX "&(gt;|#62;|#x3e;)"

This should be "&(gt;|#62;|#x3E;|#x3e;)"

    +#define CHARACTER_APOS_REGEX "&(apos;|#39;|#x27;)"
    +#define CHARACTER_QUOT_REGEX "&(quot;|#34;|#x22;)"

To repeat my question from yesterday: Why are you special-casing &quot; and &apos;? The spec doesn't say to do that. (And why are you converting &#34; and &#39;, but not &#40;, &#41;, &#42;, &#43;, etc?) Do you have any evidence that applications escape " or ' when sending notifications? If so, let me know, and I'll change the spec. If not, unescaped text will work more often without those conversions.

    +#define LINK_MATCH_REGEX "<a[^>]*?href\\s*=\\s*[\"\"']?([^'\"\" >]+?)['\"\"]?>(.*?)</a>"
    +#define LINK_REPLACE_REGEX "<a[^>]*?href\\s*=\\s*[\"\"']?([^'\"\" >]+?)['\"\"]?>|</a>"

Why are you treating <a> elements separately from other elements? Again, that doesn't match the algorithm I provided in the spec, and I don't see any reason to do it. Please, either follow the spec or tell me why it needs changing, don't do something undocumented.

    +#define OTHER_MATCH_REGEX "<([A-Z][A-Z0-9]*)\\b[^>]*>(.*?)</\\1>"

I'm not a regexp expert, but that "\1" looks like it's not following the spec, which says to drop *any* sequence of characters that starts with "</" and ends with ">", not just those where what's between them matches a previous tag. Is your aim here to reduce unwanted tag-swallowing, on the grounds that "foo</i>" malfunctions in notification-daemon anyway? That could be nifty, though it would be harder to specify and implement in a non-regexp way. (Unless, or maybe even if, the Desktop Notifications Specification 1.0 totally redoes the body text format, the guessing algorithm probably should be included in that spec for all servers to implement.)

Revision history for this message
Cody Russell (bratsche) wrote :

"To repeat my question from yesterday: Why are you special-casing &quot; and &apos;? The spec doesn't say to do that. (And why are you converting &#34; and &#39;, but not &#40;, &#41;, &#42;, &#43;, etc?) Do you have any evidence that applications escape " or ' when sending notifications? If so, let me know, and I'll change the spec. If not, unescaped text will work more often without those conversions."

Because I removed the Pango markup parsing, which was doing this for us. Once I removed that then I would get IM's from friends that said things like, "Hey let&apos;s go see a movie" or whatever. I only added &#39;, &#x27;, etc for thoroughness since you specifically mentioned doing that for ampersands, greater-than, and less-than characters.

+#define CHARACTER_LT_REGEX "&(lt;|#60;|#x3C;)"

This should be "&(lt;|#60;|#x3C;|#x3c;)"

I'm doing caseless regex, so this isn't necessary. However, I'm not sure if &#x3c; and &#X3c should be treated the same, so perhaps this is a bug and I should not be doing caseless? Let me know if that's wrong.

I'll remove the <a href> regex stuff. It doesn't seem necessary now that the 'other' regex seems to be solid enough to pick those up now.

Revision history for this message
Cody Russell (bratsche) wrote :
Revision history for this message
Cody Russell (bratsche) wrote :
Revision history for this message
Cody Russell (bratsche) wrote :

This adds singleton tag catching, e.g. <br />

Also fixes a problem in TAG_REPLACE_REGEX where it wasn't matching closing tags if they were more than one character.. e.g., </img>

Revision history for this message
Cody Russell (bratsche) wrote :
Revision history for this message
Cody Russell (bratsche) wrote :

Note that this removes G_REGEX_CASELESS. This was intentional, as mpt discovered that notification-daemon interprets e.g., <b> but not <B>. I'm curious to do some investigation into this and find out if this is perhaps a bug in Pango or if it's deliberate.. but we can hold off on that for another time.

Revision history for this message
Mirco Müller (macslow) wrote :

I'm about to push this to trunk. All old and new test-cases succeed.

Mirco Müller (macslow)
Changed in notify-osd (Ubuntu):
status: Confirmed → Fix Committed
Mirco Müller (macslow)
Changed in notify-osd (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.