mount ext fileystem fails, booting fails, blkid produces no output

Bug #518582 reported by meierfra
126
This bug affects 21 people
Affects Status Importance Assigned to Milestone
util-linux (Ubuntu)
Fix Released
High
Kees Cook
Nominated for Jaunty by morryis
Nominated for Karmic by morryis
Lucid
Fix Released
High
Kees Cook

Bug Description

Symptoms: (Ubuntu 9.10 on an ext4 partition /dev/sda1)

1. Booting fails with error message:

     Gave up waiting for root device. common problems
    -Boot args(cat/proc/cmdline)
    -check rootdelay=(did the system wait long enough?)
    -check root=(did the system wait for the right device?)
     Missing modules(cat/proc/modules; IS/dev)
     Alert!/dev/disk/by-uuid/d3bb8e26-9798-49 ce-bc57-afb6ca6za7ba does not exist. Drop to a shell!

2. "mount /dev/sda1 /mnt" gives "mount: you must specify the filesystem type"
    but "mount -t ext4 /dev/sda1" is successful

3. "blkid /dev/sda1" returns nothing

4. "blkid -p /dev/sda1" gives "ambivalent result (probably more filesystems on the device)"

5. "hexdump -s 0x410 -n 2 /dev/sda1" returns on of the four numbers hexadecimals 137f, 138f, 2468,2478,

6. "sudo BLKID_DEBUG=0xffff blkid -p /dev/sda1 | grep "minix: magic" returns

    "ambivalent result (probably more filesystems on the device)"
     minix: magic sboff=16, kboff=1

7. After installing util-linux-ng-2.17 from source: "wipefs /dev/sda1" returns:

    offset type
----------------------------------------------------------------
0x410 minix [filesystem]

0x438 ext4 [filesystem]
                         UUID: d3bb8e26-9798-49ce-bc57-afb6ca62a7ba

I was able to cure the problem by creating a file on "/dev/sda1" and whereby changing the number of free inodes.

There have been seven of these case in the Ubuntu forums by now:
http://ubuntuforums.org/showthread.php?t=1397193
http://ubuntuforums.org/showthread.php?t=1414662
http://ubuntuforums.org/showthread.php?t=1068895
http://ubuntuforums.org/showthread.php?t=1422558

My diagnosis:

Minix uses the "magic number" 137f, 138f, 2468,2478, at the location 0x410 to mark a Minix file system.

0x410 is also the location any ext filesystem uses to record the number of free inodes.

In decimals those four numbers are 4991,5007,9320,9336

If the number of free inodes happens to be one of those four numbers plus a multiple of 65536, then the ext filesystem will write one of the four Minix magic numbers to the 0x410 location.

So blkid gets confused and does not know whether the files system is Minix or Ext.
In particular, if this happens on the root partition, Ubuntu will no longer boot.

Cure:

Boot from the Ubuntu LiveCD and create a file on the affected partition:

sudo mount /dev/sda1 /mnt
sudo touch /mnt/empty_file

This solution works for an ext4 filesystem. But does not work for ext2. For ext2 one needs to replace the UUID in fstab and grub.cfg by the device name. See https://sourceforge.net/apps/mediawiki/bootinfoscript/index.php?title=Boot_Problems:minix for more details.

Revision history for this message
meierfra (meierfra) wrote :

Here is another case:

http://ubuntuforums.org/showthread.php?t=1414662

Minix uses four different magic numbers:

# Minix filesystems - Juan Cespedes
0x410 leshort 0x137f Minix filesystem
0x410 leshort 0x138f Minix filesystem, 30 char names
0x410 leshort 0x2468 Minix filesystem, version 2
0x410 leshort 0x2478 Minix filesystem, version 2, 30 char nam
es

Any of these four magic numbers will trigger the bug. So there is 1: 16384 chance to be affected be this bug after any reboot. This really needs to get some attention

meierfra (meierfra)
description: updated
meierfra (meierfra)
description: updated
description: updated
description: updated
meierfra (meierfra)
summary: - mount ext fileystem fails, booting fails, blkid produces no putput
+ mount ext fileystem fails, booting fails, blkid produces no output
description: updated
description: updated
meierfra (meierfra)
description: updated
meierfra (meierfra)
affects: ubuntu → util-linux (Ubuntu)
Revision history for this message
tgpraveen (tgpraveen89) wrote :

shouldnt its importance be high or something

Kees Cook (kees)
Changed in util-linux (Ubuntu Lucid):
milestone: none → ubuntu-10.04-beta-2
importance: Undecided → High
Revision history for this message
Kees Cook (kees) wrote :

I cannot reproduce this with a simple loop-back filesystem:

sudo -s
cd /tmp
dd if=/dev/zero of=test.ext4 bs=1 count=1 seek=1G
mkfs.ext4 -F test.ext4
mkdir -p /mnt/test
mount -o loop text.ext4 /mnt/test
echo $(seq $(hexdump -s 0x410 -n 2 -e '"%d\n"' test.ext4) - 9336 ) | (cd /mnt/test; xargs touch)
umount /mnt/test

$ blkid -p text.ext4
test.ext4: UUID="e6c2eb3d-91ca-42bd-8f09-ff118c9f47c1" TYPE="ext4"
$ hexdump -s 0x410 -n 2 -e '"%d\n"' test.ext4
9336

In reading the blkid source, I think the minix filesystem superblock magic is located at 0x110 not 0x410. However, this bug shows blkid reading 0x110 ("minix: magic sboff=16, kboff=1") kboff=1 == 0x100, sboff=16 == 0x10

Revision history for this message
Kees Cook (kees) wrote :

Hrm, no, I am wrong about location. It does seem to be 0x410, but there is additional logic in the minix probe that needs to be satisfied. See probe_minix in shlibs/blkid/src/superblocks/minix.c

Revision history for this message
Kees Cook (kees) wrote :

For a filesystem that fails, can you attach the first 10K of the partition?

sudo dd if=/dev/sda1 of=/tmp/10k.data bs=10k count=1

The minix sanity checks are failing:

                sb = blkid_probe_get_sb(pr, mag, struct minix_super_block);
                if (!sb || sb->s_imap_blocks == 0 || sb->s_zmap_blocks == 0)
                        return -1;

                zones = version == 2 ? sb->s_zones : sb->s_nzones;

                /* sanity checks to be sure that the FS is really minix */
                if (sb->s_imap_blocks * MINIX_BLOCK_SIZE * 8 < sb->s_ninodes + 1)
                        return -1;
                if (sb->s_zmap_blocks * MINIX_BLOCK_SIZE * 8 < zones - sb->s_firstdatazone + 1)
                        return -1;

The data overlaps beyond the start of the ext4 blocks group descriptor, so the value of s_ninodes is not obvious to me.

struct minix_super_block struct ext4_group_desc
        uint16_t s_ninodes; /* before the ext4_group_desc */

        uint16_t s_nzones; __le32 bg_block_bitmap_lo; /* Blocks bitmap block */
        uint16_t s_imap_blocks;

        uint16_t s_zmap_blocks; __le32 bg_inode_bitmap_lo; /* Inodes bitmap block */
        uint16_t s_firstdatazone;

        uint16_t s_log_zone_size; __le32 bg_inode_table_lo; /* Inodes table block */
        uint32_t s_max_size; __le16 bg_free_blocks_count_lo;/* Free blocks count */

        uint16_t s_magic; __le16 bg_free_inodes_count_lo;/* Free inodes count */

...

Revision history for this message
Kai Mast (kai-mast) wrote :

Is anyone even using a minix filesystem? I mean can't we just not check for that?

Revision history for this message
Kees Cook (kees) wrote :

Ah, sorry, I should be comparing against the superblock, not the group_desc.

uint16_t s_ninodes;
uint16_t s_nzones;
uint32_t s_inodes_count;

uint16_t s_imap_blocks;
uint16_t s_zmap_blocks;
uint32_t s_blocks_count;

uint16_t s_firstdatazone;
uint16_t s_log_zone_size;
uint32_t s_r_blocks_count;

uint32_t s_max_size;
uint32_t s_free_blocks_count;

uint16_t s_magic;
uint32_t s_free_inodes_count;

for "sb->s_imap_blocks == 0 || sb->s_zmap_blocks == 0" to fail, the low 32bits of the ext4 max block count must be >65536 and not a multiple of 65536, so adjust the "dd" in comment 3 to:

dd if=/dev/zero of=test.ext4 bs=1 count=1 seek=1026M

this will result in non-zero values for both s_imap_blocks and s_zmap_blocks:

$ hexdump -s 0x404 -n 4 -e '2/2 "%d " "\n"' /tmp/test.ext4
512 4

Revision history for this message
Kees Cook (kees) wrote :

This failure mode depends on the size of the filesystem, the number of reserved blocks, and the inodes used. This series of steps reproduces the problem for me:

rm -f /dev/shm/test.ext4
dd if=/dev/zero of=/dev/shm/test.ext4 bs=1 count=1 seek=1023M
mkfs.ext4 -F /dev/shm/test.ext4
tune2fs -r 0 /dev/shm/test.ext4
mkdir -p /mnt/test
mount -o loop /dev/shm/test.ext4 /mnt/test
echo $(seq $(( $(hexdump -s 0x410 -n 2 -e '"%d\n"' /dev/shm/test.ext4) - 5007 )) ) | (cd /mnt/test; xargs touch)
umount /mnt/test
blkid -p /dev/shm/test.ext4

/dev/shm/test.ext4: ambivalent result (probably more filesystems on the device, use wipefs(8) to see more details)

Revision history for this message
Kees Cook (kees) wrote :

This script may help identify "at risk" filesystems. For my system, my /boot partition is vulnerable, but I'd need to create 20000 more inodes to hit the glitch.

Changed in util-linux (Ubuntu Lucid):
status: New → Confirmed
Revision history for this message
Karel Zak (kzak) wrote :

Upstream patch (fixed 14 days ago...)
http://git.kernel.org/?p=utils/util-linux-ng/util-linux-ng.git;a=commit;h=74b1659ddaac4aa409b56d1eaa07d87b5b11b98e

The patch is also included in the bugfix release 2.17.2.

Revision history for this message
Kees Cook (kees) wrote :

util-linux (2.17.2-0ubuntu1) now in Lucid.

Changed in util-linux (Ubuntu Lucid):
status: Confirmed → Fix Released
assignee: nobody → Kees Cook (kees)
Revision history for this message
Dmitry Diskin (diskin) wrote :

Just fixed a 9.10 system using the cure suggested in the bug description. Will the fix be applied to 9.10 as well?

Revision history for this message
Paulus (donmatteo) wrote :

Kudos to the original reporter and inventor of the ingenious solution. I was bit by this bug on a regular laptop and dumbfounded as my Lucid system's "blkid" saw the file system, while the Karmic system didn't.

It would be a very good idea indeed to backport this patch to karmic's util-linux, as it takes some determination to find the source of the weird behavior caused by it (in my case just a blank screen on Karmic's bootup).

Changed in util-linux (Ubuntu):
status: Fix Released → Confirmed
Revision history for this message
Pete Graner (pgraner) wrote :

@haskinsa99 Please don't change the status out of Fix Released. That is the final status for the bug, Confirmed is not a valid state once a bug has been marked Fixed Released. Thanks.

Changed in util-linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Daniel Drake (dsdrake) wrote :

A decade later, found the same bug in grub.

It might not affect "standard" grub usage (e.g. Ubuntu) where it may not need to probe and guess filesystem type just to boot the system, but thank you for the excellent diagnosis nevertheless!

https://lists.gnu.org/archive/html/grub-devel/2020-05/msg00205.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.