\r not allowed in commit message

Bug #492647 reported by James Westby
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad CSCVS
Triaged
Low
Unassigned
Ubuntu Distributed Development
Confirmed
Low
Unassigned

Bug Description

Hi,

Similar to bug 433779, the cscvs import of kerberos fails due to \r in the
commit message.

  http://launchpadlibrarian.net/36289391/kerberos-trunk-log.txt

It may be a cscvs bug for not escaping, but lets start here.

Thanks,

James

Tags: bzr udd
James Westby (james-w)
tags: added: udd
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 492647] [NEW] \r not allowed in commit message

On Fri, 2009-12-04 at 21:47 +0000, James Westby wrote:
> Public bug reported:
>
> Hi,
>
> Similar to bug 433779, the cscvs import of kerberos fails due to \r in the
> commit message.
>
> http://launchpadlibrarian.net/36289391/kerberos-trunk-log.txt
>
> It may be a cscvs bug for not escaping, but lets start here.

Yes, I think cscvs should escape.

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Fri, 2009-12-04 at 21:47 +0000, James Westby wrote:
>> Public bug reported:
>>
>> Hi,
>>
>> Similar to bug 433779, the cscvs import of kerberos fails due to \r in the
>> commit message.
>>
>> http://launchpadlibrarian.net/36289391/kerberos-trunk-log.txt
>>
>> It may be a cscvs bug for not escaping, but lets start here.
>
> Yes, I think cscvs should escape.
>
> -Rob
>

Or strip...

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksaYl0ACgkQJdeBCYSNAAPS6QCgrnJPBJnid5+D/Sacx/28CJT8
GqcAniDqTjhjNizHbwTCZXsrEnwxrwDa
=8bQM
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

I should clarify. The reason we disallow '\r' in commit messages is because all XML serializers silently strip them when round-tripping.

<message>this is some\r\n
text\rwith an inline\r</message>

from_string() =
"this is some\ntext\nwith an inline\n"

It seems to be happening very low in the stack, by the expat parser. So it was not obvious how to change that behavior. It caused some broken behavior for per-file commits, because they were using bencode which is length-prefixed and len('foo\r\n') != len('foo\n') obviously.

Now, 2a doesn't have that specific limitation, but you can copy data from 2a => rich-root-pack => 2a, and then your data is in 2a and still broken.

If we feel we really need to preserve \r, then we need to escape it. History so far has shown that people don't care if it is preserved, and just squashing was fine. message.replace('\r\n', '\n').replace('\r', '\n') has worked in general.

Revision history for this message
Andrew Bennetts (spiv) wrote :

> If we feel we really need to preserve \r, then we need to escape it.

Or we can define a new 2a-like format that allows \r. Revisions without \r in commit messages would be freely transferrable between 2a and $new_format, but revisions with \r would be restricted to the new format. We could presumably look at related changes like allowing more characters in filenames at the same time.

I realise we'd prefer not to create a new format so soon after the last one, but it is an option.

If we take the escaping option, we should probably also set a revision-property on any revision where we've done this escaping. That makes it possible to consider migrating to storing it unescaped in future formats (although there are probably implications for things like testament generation to deal with if we did such a migration).

I do feel inclined to try preserve as much fidelity of original data as possible, even though in practice we'll never be able to be perfect for all import sources. But I can see a fairly strong argument that preserving \r could be too far past the point of diminishing returns.

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 492647] Re: \r not allowed in commit message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

> I do feel inclined to try preserve as much fidelity of original data as
> possible, even though in practice we'll never be able to be perfect for
> all import sources. But I can see a fairly strong argument that
> preserving \r could be too far past the point of diminishing returns.
>

I think we are past the point of diminishing returns. Having a
compatibility and round-trip process based on whether '\r' is in the
commit message is quite a bit for an '\r' character. It certainly
doesn't seem worth a watershed to me.

You could, of course, make sure the '\r' trapping code is properly
present in all older repo formats, and then have *them* break if they
tried to get a revision with a message that had '\r' in them. So the
formats would claim compatibility, with a known failure mode. (So you
can't bzr pull from a 2a with a '\r' in it, to a rich-root-pack
repository, and it would break trying to fetch the revision that has
'\r' in it.)

You just have to trust that you actually have all the safety checks in
place, otherwise someone who does: 2a => rich-root-pack => 2a ends up
having a revision texts that isn't identical to the original source
text, and both share the same revision id.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkso/H8ACgkQJdeBCYSNAAOTcwCgy6T9mbr9BfOUHngl8rJ3anA9
VMgAoJEhbyOau+/cymG40rhPao5xJ7DH
=Bn3y
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

Note that this is UDD, but as yet doesn't seem to be a 'hottest100'. At least 'kerberos' isn't in that list. I'm still trying to track down the failure logs for the other hottest100 import failures, so this may change.

Revision history for this message
John A Meinel (jameinel) wrote :

If this becomes hottest 100, we should bump the priority to high.

Changed in bzr:
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
Robert Collins (lifeless) wrote :

Consensus seems to be that cscvs should do something about it.

affects: bzr → launchpad-cscvs
Changed in udd:
status: New → Confirmed
importance: Undecided → Low
James Westby (james-w)
tags: added: bzr
Changed in launchpad-cscvs:
status: Confirmed → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.