PageTemplateFile opens XML files in binary mode
Bug #143131 reported by
yuppie
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Zope 2 |
Invalid
|
Low
|
Unassigned |
Bug Description
This is a problem on Windows. If I read the specs ( http://
Is there any good reason why this was fixed for HTML, but not for XML files?
Changed in zope2: | |
status: | New → Triaged |
importance: | Medium → Low |
tags: |
added: bugday removed: bug zope |
To post a comment you must log in.
Fred Drake wrote:
> This report isn't clear. Please update the issue and explain what the
> problem is; glancing at the code on the Zope 2 and Zope 3 trunks, the
> only thing that looks suspicious to me is that re-opening an HTML file
> doesn't use Python's universal newline support.
>
> HTML is always text, so should be treated that way on input. XML may
> contain textual content, but should always be handed to the XML parser
> as a raw byte stream to allow the proper decoding machinery a shot at
> doing the right thing.
I try to restate the issue:
This is a problem in CMFSetup. CMFSetup creates XML using PageTemplateFiles. These files are checked in to CVS in text mode. So depending on the platform, they contain different newlines. If opened as text file, these newlines are normalized to LF. But opened as binary files, newlines are not normalized. Normalizing could be done at a later point, but that's not the case. So line breaks are not normalized before parsing, but the parser expects LF newlines.
Removing newlines, the parser removes only LF, leaving in the CR. Adding newlines, the parser adds LF. Existing newlines are preserved as CR/LF. So the returned XML contains all 3 kinds of newlines.
This is what the XML 1.0 spec says:
"""2.11 End-of-Line Handling
XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters CARRIAGE RETURN (#xD) and LINE FEED (#xA).
To simplify the tasks of applications, the XML processor MUST behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character."""