Scour emits CR CR LF for newlines on Inkscape/Windows
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Inkscape |
Fix Released
|
Low
|
jazzynico | ||
Scour |
Fix Released
|
Medium
|
Unassigned | ||
scour (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Currently, under Windows, Scour produces newlines of the form \r\r\n (=CR-CR-LF or \x0D\x0D\0A), which is not only wrong, but gets displaced as an additional blank line between every svg code line in Unicode editors (and might perhaps confuse older text editors).
There is a unittest designed to catch such an error, namely, EnsureLineEndings, but the error only occurs after writing the document to disk, while the unittest investigates the string produced by scour before writing the document to disk.
How does it happen?
Python, following C, tries to shield programmers from OS related newline troubles, so you can always use \n as the newline character: if you read a file in text mode, os.linesep gets replaced with '\n', and when you write a file in text mode, '\n' gets replaced with os.linesep.
Scour, when serializing the svg document, directly uses os.linesep instead of '\n', and on Windows systems, os.linesep is '\r\n'. Then it saves the file in text mode, and Python replaces '\n' with '\r\n', which means that '\r\n' is replaced with '\r\r\n'.
What is rather confusing is that when I'm using Scour as an export filter for Inkscape, the line ending actually becomes '\r\n'. In this case, Scour sends the resulting file via sys.stdout.write(), and the default mode for sys.stdout.write() is 'text mode', which means that sys.stdout.write() also should replace '\n' with '\r\n' and '\r\n' with '\r\r\n'.
Since Inkscape uses '\n' as the line ending if I save files without Scour as the export filter, I suspect that Inkscape somehow is replacing '\r\n' with '\n' after reading the document from stdin and before saving it to a file, which obviously turns '\r\r\n' into '\r\n', but that's just a guess, and frankly, I have no idea what is going on inside Inkscape.
How *not* to fix it?
The usual way would be to get rid of all the os.linesep and simply replace them with '\n'. Unfortunately, while that would work just fine if Scour is called from the command line to convert one file into another file, I guess it would screw Inkscape. This was the behavior of Scour before revision 153, and the purpose of revision 153 was to fix bug #482215 that Scour, used in conjunction with Inkscape, produced the line ending '\n' on Windows. So replacing os.linesep with '\n' within the code of Scour probably wouldn't work.
How to fix it (I think):
The alternative is: keep os.linesep, but save the file in binary mode instead of text mode, thereby suppressing Python from doing its magic. Instead of using
outfile = maybe_gziped_
within parse_args() that means replacing that line with
outfile = maybe_gziped_
instead. Since this change doesn't affect sys.stdout, it shouldn't break the interaction with Inkscape. And on Linux/Mac OS X, changing from text mode to binary mode doesn't make any difference at all. And it should work with both gzipped and uncompressed files.
Related branches
summary: |
- Windows Newlines + Scour emits CR CR LF for newlines on Inkscape/Windows |
tags: | added: patch |
Changed in inkscape: | |
status: | Fix Committed → Fix Released |
Added Affects: Inkscape.
I believe we should work with Inkscape on this bug, to know what works and what doesn't on what platforms.
We *could* make Scour write \n in binary mode, and that would be it -- it would be valid SVG on all platforms, since unimportant whitespace is removed, and the XML/SVG specifications define their newline behavior explicitly if xml:space is set to preserve. However, Windows Notepad users would not be happy (workaround: use Wordpad, it recognises \n), and there *could* be all sorts of weird things happening with Inkscape. That would then be an Inkscape bug related to Scour, not a Scour bug per se.
So I'll be awaiting Inkscape's input on this bug (and patch) first.