testbrowser doesn't strip starting and ending line breaks.

Bug #98371 reported by Björn Tillenius
2
Affects Status Importance Assigned to Milestone
Zope 3
Fix Released
Medium
Unassigned

Bug Description

http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.1 specifies that a line break immediately after a start tag or immediately before an end tag must be ignored.

No stripping is done by testbrowser, though, which causes problems when dealing with textarea fields. I'm attaching a patch which adds a test case showing the problem.

Tags: bug core
Revision history for this message
Björn Tillenius (bjornt) wrote :
Revision history for this message
Björn Tillenius (bjornt) wrote :

Hmm, the attached diff isn't displayed properly, and changing the content type to text/plain doesn't seem to work.... you can get the diff at this URL instead:

  http://people.ubuntu.com/~bjorn/issue.644.testcase.diff

Revision history for this message
Stephan Richter (srichter) wrote :

Changes: submitter email, edited transcript, importance (medium => critical)

Revision history for this message
Jim Fulton (jim-zope) wrote :

Changes: importance (critical => 3.3 Release)

Revision history for this message
Benji York (benji) wrote :

After initial investigation, this looks like a bug in Mechanize (ClientForm to be specific). I'll look into it some more.

Revision history for this message
Tres Seaver (tseaver) wrote :

I'm not sure that the "must be ignored" applies to processing form
input: I think that the rest of the section applies to how the
element is *rendered* (i.e. on output). E.g., it specifies that
paragraphs delimited by <p> tags must render identically whether
they have newlines immediately after the begin tag or or
immediately before the end.

I don't think that it is desirable to have the backend strip such
newlines from user input, as a general rule.

This is particularly acute because textarea widgets are used to
enter text which is *not* HTML markup (e.g., structured /
restructured text, pasting diffs, etc.). in such cases stripping
*anything* to conform with the HTML spec is nonsensical.

(BTW, I modified the *real* content type for the patch, available
only through the ZMI, so that it renders correctly now).

Revision history for this message
Benji York (benji) wrote :

I've figured out where to change this behavior in ClientForm, but I don't think we should. I agree with Tres, the phrase "rendered identically" being operative.

Also, The XML spec (http://www.w3.org/TR/2004/REC-xml-20040204/#sec-white-space) says "An XML processor MUST always pass all characters in a document that are not markup through to the application." So it would appear there should be no newline removal for XML. The only question is should ClientForm do different things depending on the DOCTYPE? I sure hope not. I wonder if pointy-bracket-wielding-Fred has any insight.

Revision history for this message
Björn Tillenius (bjornt) wrote :

Well, you could consider that the the browser "renders" the widget, thus depending on how it's rendered, a leading line break may, or may not, be included.

Anyway, I thought that testbrowser was supposed to work like a browser? All the browser I tested in works the way the test case specifies. If testbrowser doesn't work the same way, we can't use if for certain types of tests (at the moment we have to use raw http() calls instead of testbrowser).

As for the XML specification, I see zope.testbrowser as both an XML processor, *and* an application. I does process the XML, but afterwards it should also process the result in the same way a real browser does it.

(BTW, Tres, does that mean that you have to have special permissions to edit the content type of a file, or did I miss something?)

Revision history for this message
Björn Tillenius (bjornt) wrote :

To be really clear, Tres said that it's not desirable to strip lines from user input, with that I fully agree.

Consider the following two text areas

<textarea>foo</textarea>

<textarea>
foo</textarea>

In both examples above, the user input is 'foo', no new lines. So we wouldn't be stripping anything from the user input. If anything, at the moment testbrowser is adding new lines to the user input, which it really shouldn't be doing.

Revision history for this message
Jim Fulton (jim-zope) wrote :

> = Comment - Entry #9 by BjornT on Jul 17, 2006 2:34 am
>
> To be really clear, Tres said that it's not desirable to strip lines from
> user input, with that I fully agree.
>
> Consider the following two text areas
>
> <textarea>foo</textarea>
>
> <textarea>
> foo</textarea>
>
> In both examples above, the user input is 'foo', no new lines.

No, in the second example, the input is '\nfoo'.

The input are different. We should never strip newlines
from inside a text area. I know from experience that browsers
that I have experience don't do this.

Revision history for this message
Björn Tillenius (bjornt) wrote :

OK, I guess it depends on what you define as input. But try creating a form with such textareas and submit the form in a browser. What will the browser send to the server? In both cases (at least Firefox and Opera), it will send 'foo', it will never send '\nfoo'. That's why I said 'foo' was the input, since that is the user input from the browser's point of view.

So, once again, why should testbrowser treat the second textarea as '\nfoo' when a real browser treats it as 'foo'?

Revision history for this message
Tres Seaver (tseaver) wrote :

Uploaded: z3_issue_644.pt

Interesting note: the browser does strip leading newlines from
textarea input, but not *trailing* ones. I'm attaching a Zope2
pagetemplate (it needs hacking to get the urllib.quote_plus to
work under Z3) which demonstrates that behavior.

Revision history for this message
Jim Fulton (jim-zope) wrote :

Changes: edited transcript, importance (3.3 Release => medium), new comment

We've verified that browsers (we tested Firefox, Mozilla, Safari, and IE) will strip a single leading newline (crlf) if it is at the beginning of the string.

This will require a change to mechanize, so we can't hold up the 3.3 release for it.

Revision history for this message
Benji York (benji) wrote :

A little more detail: we also found that only a single newline, no other whitespace, will be stripped after a start tag. It also didn't matter if the browser was in quirks mode or "standards compliant" mode.

Revision history for this message
Benji York (benji) wrote :

Even more detail: I have a fix for this that I'll check in sometime soon (next 24 hours hopefully). It's still a change to Mechanize, but I'm pretty sure he (John J. Lee) will take it, and we can sync our import of Mechanize when it's included.

Revision history for this message
Benji York (benji) wrote :

Fixed in the trunk and 3.3 branch

Revision history for this message
Christian Theune (ctheune) wrote :

According to Benji this was fixed a while ago.

Changed in zope3:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.