Scour emits CR CR LF for newlines on Inkscape/Windows

Bug #717826 reported by Jan Thor on 2011-02-12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
scour (Ubuntu)

Bug Description

Currently, under Windows, Scour produces newlines of the form \r\r\n (=CR-CR-LF or \x0D\x0D\0A), which is not only wrong, but gets displaced as an additional blank line between every svg code line in Unicode editors (and might perhaps confuse older text editors).

There is a unittest designed to catch such an error, namely, EnsureLineEndings, but the error only occurs after writing the document to disk, while the unittest investigates the string produced by scour before writing the document to disk.

How does it happen?

Python, following C, tries to shield programmers from OS related newline troubles, so you can always use \n as the newline character: if you read a file in text mode, os.linesep gets replaced with '\n', and when you write a file in text mode, '\n' gets replaced with os.linesep.

Scour, when serializing the svg document, directly uses os.linesep instead of '\n', and on Windows systems, os.linesep is '\r\n'. Then it saves the file in text mode, and Python replaces '\n' with '\r\n', which means that '\r\n' is replaced with '\r\r\n'.

What is rather confusing is that when I'm using Scour as an export filter for Inkscape, the line ending actually becomes '\r\n'. In this case, Scour sends the resulting file via sys.stdout.write(), and the default mode for sys.stdout.write() is 'text mode', which means that sys.stdout.write() also should replace '\n' with '\r\n' and '\r\n' with '\r\r\n'.

Since Inkscape uses '\n' as the line ending if I save files without Scour as the export filter, I suspect that Inkscape somehow is replacing '\r\n' with '\n' after reading the document from stdin and before saving it to a file, which obviously turns '\r\r\n' into '\r\n', but that's just a guess, and frankly, I have no idea what is going on inside Inkscape.

How *not* to fix it?

The usual way would be to get rid of all the os.linesep and simply replace them with '\n'. Unfortunately, while that would work just fine if Scour is called from the command line to convert one file into another file, I guess it would screw Inkscape. This was the behavior of Scour before revision 153, and the purpose of revision 153 was to fix bug #482215 that Scour, used in conjunction with Inkscape, produced the line ending '\n' on Windows. So replacing os.linesep with '\n' within the code of Scour probably wouldn't work.

How to fix it (I think):

The alternative is: keep os.linesep, but save the file in binary mode instead of text mode, thereby suppressing Python from doing its magic. Instead of using

    outfile = maybe_gziped_file(options.outfilename, "w")

within parse_args() that means replacing that line with

    outfile = maybe_gziped_file(options.outfilename, "wb")

instead. Since this change doesn't affect sys.stdout, it shouldn't break the interaction with Inkscape. And on Linux/Mac OS X, changing from text mode to binary mode doesn't make any difference at all. And it should work with both gzipped and uncompressed files.

Related branches

Jan Thor (jan-janthor) wrote :

Added Affects: Inkscape.

I believe we should work with Inkscape on this bug, to know what works and what doesn't on what platforms.

We *could* make Scour write \n in binary mode, and that would be it -- it would be valid SVG on all platforms, since unimportant whitespace is removed, and the XML/SVG specifications define their newline behavior explicitly if xml:space is set to preserve. However, Windows Notepad users would not be happy (workaround: use Wordpad, it recognises \n), and there *could* be all sorts of weird things happening with Inkscape. That would then be an Inkscape bug related to Scour, not a Scour bug per se.

So I'll be awaiting Inkscape's input on this bug (and patch) first.

Changed in scour:
status: New → Opinion
Jan Thor (jan-janthor) wrote :

Thanks, that sounds great.

Personally, I wouldn't mind Scour always using just \n as the line ending, since it's one byte less, but that's rather moot once you intend to gzip your files (which will compress away all those occurrences of ...">\r\n<... anyway), so I see no compelling reason not to do the platform-appropriate thing.

jazzynico (jazzynico) wrote :

A document saved as Inkscape (or plain) SVG on Windows XP has \n line endings (and not \r\n as explained in Bug #482215, invalid IMHO). Thus I see no point in acting differently with Scour.

As for the patch (tested with Inkscape and scour r203), using w or wb doesn't change the generated output, and I still get \r\r\n.

Changed in inkscape:
importance: Undecided → Low
status: New → Confirmed
Jan Thor (jan-janthor) wrote :

Changing the mode to 'wb' for the call of maybe_gziped_file() within parse_args() only affects Scour as a standalone program with a given output file, not in conjunction with Inkscape. With 'wb' and Scour as a standalone command line program (and Python 2.6 on Win32), I get \r\n as the line ending for those line breaks made by Scour, unaltered line endings otherwise (like line breaks within comments).

Since that didn't work, try this patch instead.

It uses "\n" instead of os.linesep in XML output, which may mean that Windows users will get "\n" instead of "\r\r\n". If they get "\r\n", that's not so bad, but in that case we may need to re-open sys.stdout or sys.stdin in binary mode just below the call to maybe_gziped_file(). Either of sys.stdout and sys.stdin may be doing the newline replacement magic.

jazzynico (jazzynico) wrote :

With the new patch, I get "\r\n" on Windows XP. That's far better than what we had previously.
Thanks a lot!

By the way, could you tell me how stable scour currently is? Inkscape latest versions use 0.25r171,and it could be interesting (for Scour and Inkscape) to update to a more recent scour revision.

Jan Thor (jan-janthor) wrote :

With the new patch, I get '\n' as the line ending if I use scour as a standalone script, as I would expect.

Since this is what I would prefer anyway, I won't complain.

[@JazzyNico, comment #7]

The code in the trunk, 0.25 r204, is quite stable. It has enough code additions to qualify as a 0.26 release IMO, but releases are up to codedread to make.

Is "\r\n" on Windows satisfactory for you, or should Scour attempt to really write "\n" everywhere? (reopening sys.stdout in binary mode, for instance, with os.fdopen(sys.stdout.fileno(), "wb")) Windows Notepad users will be happy with "\r\n", that's for sure :)

[@Jan Thor, original post and comment #8]

Great, that's good to hear. It didn't screw Inkscape to replace os.linesep with "\n" everywhere after all, and standalone Scour works well too.


The patch in comment #6 is now in the trunk as revision 204, but I'll still wait for an answer to the above "\r\n" question.

Changed in scour:
assignee: nobody → Louis Simard (louis-simard)
importance: Undecided → Medium
status: Opinion → Fix Committed
summary: - Windows Newlines
+ Scour emits CR CR LF for newlines on Inkscape/Windows

On Wed, Feb 16, 2011 at 9:37 AM, Louis Simard <email address hidden> wrote:
> The code in the trunk, 0.25 r204, is quite stable. It has enough code
> additions to qualify as a 0.26 release IMO, but releases are up to
> codedread to make.

Um, not really :)

Releases are when we have enough fixes and we have time to make a
release. Louis, if you'd like to do it, go ahead and I will update
the downloadable script on my website with the version number, etc.


jazzynico (jazzynico) wrote :

> Releases are when we have enough fixes and we have time to make a
> release.

Well, we don't need a formal release. If you tell me there's nothing obviously broken and no major regression in the current revision, it's ok to integrate it in the Inkscape trunk.

tags: added: patch

@JazzyNico, comment #11:

There's nothing obviously broken in revision 204, and all unit tests pass. :)

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package scour - 0.25+bzr207-1

scour (0.25+bzr207-1) unstable; urgency=low

  * New upstream bzr snapshot:
    - Add the option --no-renderer-workaround.
    - Fix gradient corruption. (LP: #702423)
    - Add option to only remove autogenerated id's. (LP: #492277)
    - Fix Windows line ending handling. (LP: #717826)
    - Fix occasional production of empty defs. (LP: #717254)
    - Remove more attributes with default values. (LP: #714731)
    - Fix wrong handling of file:// references. (LP: #708515)
    - Delete text attributes from groups with only non-text elements.
      (LP: #714727)
    - Remove unnecessary text-align properties from non-text elements.
      (LP: #714720)
    - Fix wrong optimization of 0-length Bézier curves. (LP: #714717)
    - Fix decimal number processing in rule_elliptical_arc(). (LP: #638764)
    - Fix transform matrix order. (LP: #722544)
  * Drop 01-bzr195.patch, upstream now.
  * debian/cmpsvg: Fix computation of difference to add up the absolute
    difference of each pixel.
  * debian/cmpsvg: Show percentages with three digits after comma.
  * debian/cmpsvg: Lower default treshold to 0.05%.
 -- Martin Pitt <email address hidden> Tue, 15 Mar 2011 09:28:45 +0100

Changed in scour (Ubuntu):
status: New → Fix Released

This bug is fixed in release 0.26 of Scour.

Changed in scour:
status: Fix Committed → Fix Released
su_v (suv-lp) wrote :

Fixed in current trunk in revision 10180:
  Extensions. Optimized SVG export update (scour r210 + custom inx file)

Changed in inkscape:
assignee: nobody → JazzyNico (jazzynico)
milestone: none → 0.49
status: Confirmed → Fix Committed
Changed in inkscape:
status: Fix Committed → Fix Released
CoDEmanX (codemanx) wrote :

The better fix might actually be to set the "newline" parameter to an empty string on file():

    outfile = maybe_gziped_file(options.outfilename, "w", newline="")


CoDEmanX (codemanx) wrote :

Never mind, Python 2 doesn't support it...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers