md5sum can't read md5 files with CRLF line terminators

Bug #84467 reported by Bogdan Butnaru on 2007-02-11
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
coreutils
Confirmed
Undecided
Unassigned
coreutils (Ubuntu)
Wishlist
Unassigned

Bug Description

Binary package hint: coreutils

I know this is going to be weird, but...

I'd very much like it if tools like md5sum handled files with CRLF terminators better. I just tried checking a set of files using an .md5 summary generated on windows, and it doesn't work.

(1) This is really annoying. It took me five minutes to figure out that it was the line terminators that were the problem. This was compounded by the fact that neither "cat", "vi" nor "gedit" show any sign of the distinction by default.

(2) This kind of issue is never going away, and it'll only grow with more people using both systems. Already most tools, including almost every text editor, do this by default.

(3) It's more than annoying: I intended to write a script that (among many other things) checks such files often. If I need to check the line terminators myself, it's going to take several times more effort. (Well, at least double the amount.) And this will have to be done for all such scripts.

(4) In the case of md5sum, there really is no excuse not to accept the other kind of terminators. The syntax of the .md5 files is _very_ easy to parse, CRLF terminators never change in any way the semantic, and reading the .md5 files is a negligible effort compared with actually checking the hashes, so there's no "inefficiency" problem. There's no chance of backwards incompatibility, as long as output always uses the same terminators it used to.

(5) The current errors shown for such files are nonsensical. (More precisely, they don't indicate the problem unless you're already familiar with it.)

This feature request is written for md5sum because that's where I had a problem, but I'm sure this applies for many other core utils.

randomwalker (randomwalker) wrote :

confirmed in feisty (coreutils 5.97)

Changed in coreutils:
status: Unconfirmed → Confirmed
randomwalker (randomwalker) wrote :

The error message i get is

---
: No such file or directory
: FAILED open or read
: No such file or directory
: FAILED open or read
md5sum: WARNING: 2 of 2 listed files could not be read
---

which is quite terrible.

Micah Cowan (micahcowan) wrote :

I'd like to point out that the output of md5sum may not necessarily be /quite/ as easy to parse as you suspect: check the output of md5sum when the filename itself includes a backslash or newline (it will begin the line with a backslash). This is by design (broken, IMO, but too established now to change).

There has been some recent discussion on this topic on the coreutils mailing list:
http://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00181.html

It appears that a patch for this behavior has been submitted to a branch, and awaits further testing. My guess would be that it will be present in the next release.

Changed in coreutils:
assignee: nobody → micahcowan
status: Confirmed → In Progress
Micah Cowan (micahcowan) on 2007-05-15
Changed in coreutils:
importance: Undecided → Low
importance: Low → Wishlist

perl -wpe 's/\r//g' file.md5 | md5sum -c -

Micah Cowan (micahcowan) on 2008-01-28
Changed in coreutils:
assignee: micahcowan → nobody
status: In Progress → Confirmed
C de-Avillez (hggdh2) wrote :

Could one of you please try again on Hardy or Intrepid? I just tried with coreutils 6.10 (Hardy and Intrepid), and I do not see any errors (but I do not have Windows, and the only way to test was to create a file with lines terminating in cr/lf).

It still seems to happen on up-to-date Intrepid, coreutils 6.10-6ubuntu1.

$ md5sum * > files.md5
[takes a while]
$ md5sum -c <files.md5
[says everything is OK]
$ geany files.md5
[Documents > Set Line Endings > Convert and Set to CR/LF; save file]
$ md5sum -c <files.md5
[fails to parse and outputs lots of "file not found" errors]

Bogdan Butnaru — <email address hidden>
"I think I am a fallen star, I should wish on myself." – O.

C de-Avillez (hggdh2) wrote :

tested on coreutils-7.0 beta (just-released development version) -- still present.

Upstream has this comment (the newest one on this thread:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to John Blythe on 4/22/2008 12:27 PM:
| Hi, I have just noticed that checking md5 hashes in an correctly
| formated md5 file fails on linux if the file
| has crlf line terminators.

This has been previously reported (and even some tentative patches
provided), but not yet cleanly dealt with. A true fix would also have to
encode carriage returns as \r in the md5 output, so that files ending in
cr (on platforms that actually allow that) cannot be confused with a
change in line terminators.

It's been on my TODO list for a long time now, but never percolated to the
top.

- --
Don't work too hard, make some time for fun as well!

Eric Blake address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgOOoIACgkQ84KuGfSFAYCYKACeInBNSj15iT+RnxOYEqC+o5Vh
PvYAoJnkBS5t9SHdisABkoCTkidWw+sP
=yLyR
-----END PGP SIGNATURE-----

Jure Merhar (spam-aurora) wrote :

Thank you so much for explaining that the problem lies in the line terminators. I was bashing my head in why it won't work.

This bug has apparently been reported for almost 3 years now and it still hasn't been fixed??

C de-Avillez (hggdh2) wrote :

Yes indeed. It is waiting either time or a volunteer to code it. Please note that Importance is WISHLIST.

I have added a null upstream task (to mark that this is known upstream), and set the Ubuntu task to Triaged.

Changed in coreutils (Ubuntu):
status: Confirmed → Triaged
Changed in coreutils:
status: New → Confirmed
Keith (dkstathem) wrote :

As far as I can tell,
sed 's/\r$//' checksums.md5 | sed 's/\r/\n'/ | md5sum -c
works exactly as expected. If I knew where to find the source for this, I'm so annoyed I might try to decipher it. I'm not very good, but it seems like it would translate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers