HTML entity code (&whatever;) in bug descriptions is repeatedly unescaped

Bug #6446 reported by Matthew Paul Thomas
6
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Medium
Steve Alexander

Bug Description

<foo> &
&lt;foo&gt; &

Copy and paste those two lines into a bug report, and they should be displayed quite differently from each other. Currently they produce an identical result, because strings recognized as entities are being converted to their equivalent characters. This should not happen.

Each time the description is edited, the conversion is performed again. If the original description had &amp;amp;amp;amp;, after being edited once it will have &amp;amp;amp;, after being edited twice it will have &amp;amp;, and so on.

This problem appeared in bug 2021.

Tags: lp-bugs
description: updated
description: updated
description: updated
Revision history for this message
Björn Tillenius (bjornt) wrote :

This looks like it's http://www.zope.org/Collectors/Zope3-dev/468, which has been fixed in latest zope3. I'm not sure the fix got backported to the version of zope3 we're going to update to, so we might have to backport the fix ourselves.

Changed in malone:
assignee: nobody → stevea
status: New → Accepted
Revision history for this message
James Henstridge (jamesh) wrote :

I am pretty sure fmt:text-to-html is not to blame.

The Zope TextWidget probably has the current behaviour to work around problems with non UTF-8 form submission: if I have an HTML form that will submit in latin1 (e.g. if the page is latin1), but I enter a non-latin1 character into a form field, it will be sent to the server entity escaped. The web server has no way to tell if the user entered the chracter or the entity itself.

Since our pages are served as UTF-8, we should never see the confusing behaviour, so the unescaping performed by Zope is always an error for us.

Revision history for this message
Björn Tillenius (bjornt) wrote : Re: [Bug 6446] HTML entity code (&whatever; ) in bug descriptions is repeatedly unescaped

On Thu, Jan 05, 2006 at 01:05:00PM -0000, James Henstridge wrote:
> The Zope TextWidget probably has the current behaviour to work around
> problems with non UTF-8 form submission: if I have an HTML form that
> will submit in latin1 (e.g. if the page is latin1), but I enter a non-
> latin1 character into a form field, it will be sent to the server entity
> escaped. The web server has no way to tell if the user entered the
> chracter or the entity itself.

Oh, didn't know that. It sounds like a valid use case for unescaping the
string. But that's not the reason it gets unescaped, since
xml.sax.saxutils.unescape() is used, which converts only a very small
subset of all HTML entities, so it's definitely a bug.

> Since our pages are served as UTF-8, we should never see the confusing
> behaviour, so the unescaping performed by Zope is always an error for
> us.

Actually, I think it's possible for a client to request another
encoding. Although it's quite safe to assume that we serve only UTF-8.

Revision history for this message
James Henstridge (jamesh) wrote :

It is possible force the encoding by adding an accept-encoding="UTF-8" attribute to the <form> element, which is supported by the major browsers (even if the user switches encoding), but I don't think this is ever likely to be an issue in practice.

Revision history for this message
Björn Tillenius (bjornt) wrote :

I'm quite sure this bug has been fixed now in the version of Zope3 we're using.

Revision history for this message
Christian Reis (kiko) wrote :

Tested on staging using the string supplied and it works. Ensuring it is so:

<foo> &
&lt;foo&gt; &

Changed in malone:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.