md5sum fails with message "Invalid argument" on 4,294,967,295-byte files in FAT32 - tracked down to Ubuntu incompatible change to kernel's fread and stdio stream

Bug #1649342 reported by Jaime Gaspar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury
Yakkety
Fix Released
Medium
Joseph Salisbury
Zesty
Fix Released
Medium
Joseph Salisbury

Bug Description

Bug discovered in the threads at http://lists.gnu.org/archive/html/bug-coreutils/2016-12/msg00008.html and https://bugzilla.kernel.org/show_bug.cgi?id=189981 partially quoted below (with the permission of the persons involved).

=== Message 1 ===

--- Bug ---
In a FAT32 file system, if one runs md5sum on a 4,294,967,294-byte file (one byte less than the maximum file size) it succeeds, but if one runs md5sum on a 4,294,967,295-byte file (the maximum file size) it fails with error message "Invalid argument".

--- How to reproduce the bug ---
Create a FAT32 file system in a file "tmp.fs":
   truncate -s 9G tmp.fs
   mkfs.vfat -F 32 tmp.fs
Mount at "/tmp/mounted_tmp/" the file system in file "tmp.fs":
   sudo mkdir /tmp/mounted_tmp/
   sudo mount -o loop,rw,uid=1000,gid=1000 tmp.fs /tmp/mounted_tmp/
Create two files in "/tmp/mounted_tmp/", file "file_1" with 4,294,967,294 bytes and file "file_2" with 4,294,967,295 bytes:
   cd /tmp/mounted_tmp/
   truncate -s 4294967294 file_1
   truncate -s 4294967295 file_2
Run md5sum on the two files "file_1" and "file_2":
   md5sum file_1
   md5sum file_2
The outputs should be respectively (notice that the second output is an error message):
   541249e3205af07b4a03f891185f64a0 file_1
   md5sum: file_2: Invalid argument
Unmount the file system at "/tmp/mounted_tmp/":
   cd ..
   sudo umount /tmp/mounted_tmp/
   sudo rmdir /tmp/mounted_tmp/
Remove the file "tmp.fs".

--- Notes ---
Tested with md5sum 8.25 running on an updated Ubuntu 16.10 with kernel 4.8.0-30-generic.
The same bug affects sha1sum, sha224sum, sha256sum, sha384sum, and sha512sum, but not crc32.

=== Message 2 ===

...
> --- How to reproduce the bug ---
...

I can't repro this with any md5sum version on 4.2.5-300.fc23.x86_64
So I'm guessing a kernel regression.
Can you strace -o /tmp/md5sum.strace md5sum file_2,
and look towards the end of the strace file to identify the syscall returning EINVAL?
In any case I'd direct the issue towards the kernel folks.
...

=== Message 3 ===

> Can you strace -o /tmp/md5sum.strace md5sum file_2,
> and look towards the end of the strace file to identify the syscall
> returning EINVAL?

It is the seventh and eighth lines below:

      ...
   read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768
   read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768
   read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768
   read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768
   read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32767
   read(3, 0x25c12b0, 8192) = -1 EINVAL (Invalid argument)
   read(3, 0x25c12b0, 8192) = -1 EINVAL (Invalid argument)
   write(2, "md5sum: ", 8) = 8
   write(2, "file_2", 6) = 6
   open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 4
   fstat(4, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0
   read(4, "# Locale name alias data base.\n#"..., 4096) = 2995
   read(4, "", 4096) = 0
   close(4) = 0
   open("/usr/share/locale/en_GB/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
   open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
   open("/usr/share/locale-langpack/en_GB/LC_MESSAGES/libc.mo", O_RDONLY) = 4
   fstat(4, {st_mode=S_IFREG|0644, st_size=3537, ...}) = 0
   mmap(NULL, 3537, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7f53d8d0a000
   close(4) = 0
   open("/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
   write(2, ": Invalid argument", 18) = 18
   write(2, "\n", 1) = 1
   lseek(3, 0, SEEK_CUR) = 4294967295
   close(3) = 0
   close(1) = 0
   close(2) = 0
   exit_group(1) = ?
   +++ exited with 1 +++

=== Message 4 ===

...
> read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32767

OK we've read all we can, but to verify md5sum will do:
  fread(buffer + 32767, 1, 1, stream)

Then the stdio stream will issue the underlying read()s
I'm not too sure where there are two reads here,
but they shouldn't be returning EINVAL, but just
returning 0 to indicate EOF.

> read(3, 0x25c12b0, 8192) = -1 EINVAL (Invalid argument)
> read(3, 0x25c12b0, 8192) = -1 EINVAL (Invalid argument)

So it's a kernel bug as suspected.
...

=== Message 5 ===

...
Hm, tested on debian/testing with v4.8 vanilla (+ debian gcc PIE fix).
However, I couldn't reproduce it.

    # truncate -s 9G tmp.fs
    # mkfs.vfat -F 32 tmp.fs
    mkfs.fat 4.0 (2016-05-06)
    # mount -o loop,rw,uid=1000,gid=1000 tmp.fs m
    # cd m
    # truncate -s 4294967294 file_1
    # truncate -s 4294967295 file_2
    # md5sum file_1
    541249e3205af07b4a03f891185f64a0 file_1
    # md5sum file_2
    c654ebc4b3472cfa01ade24bbbbc6d3e file_2

Can you try v4.8 vanilla?
...

=== Message 6 ===

...
I tested with kernel 4.8.0-040800-generic from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.8/ and I cannot reproduce the bug.
I tested with kernel 4.8.0-30-generic from an updated Ubuntu 16.10 and I can reproduce the bug.

=== Message 7 ===

...
With quick check, ubuntu seems to added incompatible change. This should be
reported to ubuntu.

@@ -1674,6 +1687,10 @@
     unsigned int prev_offset;
     int error = 0;

+ if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
+ return -EINVAL;
+ iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
+

Revision history for this message
Paul White (paulw2u) wrote :

Refiling against the kernel as both of your links suggest

affects: ubuntu → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1649342

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: kernel-da-key
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Triaged
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The changes mentioned in the bug description were made by the following commit in mainline:

commit c2a9737f45e27d8263ff9643f994bda9bac0b944
Author: Wei Fang <email address hidden>
Date: Fri Oct 7 17:01:52 2016 -0700

    vfs,mm: fix a dead loop in truncate_inode_pages_range()

That commit was added to mainline as of 4.9-rc1. Ubuntu 16.10 received this changes via stable updates from upstream. This commit came in with the 4.8.4 upstream stable updates(Commit 3d549dc in 4.8.4).

Can you test the upstream 4.8.4 upstream stable kernel and see if it also exhibits this bug? It can be downloaded from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.8.4/

It might also be worthwhile to test the current 4.9 mainline kernel to see if the bug is fixed or still exists in mainline. It can be downloaded from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.9/

tags: added: yakkety
Revision history for this message
Jaime Gaspar (jaimegaspar) wrote :

I tested with kernel 4.8.4-040804-generic from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.8.4/ and I can reproduce the bug.
I tested also with kernel 4.9.0-040900-generic from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.9/ and I can also reproduce the bug.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with a revert of commit c2a9737f45e. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1649342/

Can you test this kernel and see if it resolves this bug?

Note, you need to install both the linux-image and linux-image-extra .deb packages.

Revision history for this message
Jaime Gaspar (jaimegaspar) wrote :

I tested with kernel 4.8.0-30-generic from http://kernel.ubuntu.com/~jsalisbury/lp1649342/ and I cannot reproduce the bug.

Changed in linux (Ubuntu Yakkety):
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Zesty):
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The 4.10-rc5 kernel is now available. Can you test this kernel and see if it still exhibits this bug?

Revision history for this message
Jaime Gaspar (jaimegaspar) wrote :

I tested with kernel 4.10.041000-rc5-generic from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10-rc5/ and I cannot reproduce the bug.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Does this bug still exist in Ubuntu with the latest updates?

Revision history for this message
Jaime Gaspar (jaimegaspar) wrote :

I tested with Ubuntu with the latest updates and I cannot reproduce the bug.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the feedback, Jamie!

Changed in linux (Ubuntu Yakkety):
status: Triaged → Fix Released
Changed in linux (Ubuntu Zesty):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.