fsck dies on boot with USB drives

Bug #276045 reported by Richard Eames
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sysvinit
New
Undecided
Unassigned
e2fsprogs (Ubuntu)
Invalid
Undecided
Unassigned
xfsprogs (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

I have a 750GB USB drive formatted with XFS that I plugged in for booting, so that it can be mounted to /media/usbd. The last few kernels seem to have an issue with this at boot when it checks the drives. I get dropped to shell asking to do an fsck manually, or press CTRL+D to continue.

I'll attach my /var/log/fsck, dmesg, lsusb, and /etc/fstab

Revision history for this message
Richard Eames (naddiseo) wrote :
Revision history for this message
Richard Eames (naddiseo) wrote :
Revision history for this message
Richard Eames (naddiseo) wrote :
Revision history for this message
Richard Eames (naddiseo) wrote :
Revision history for this message
Richard Eames (naddiseo) wrote :

Could be a dupe of #97206 ?

Revision history for this message
Richard Eames (naddiseo) wrote :

Forgot to mention: This is with a Alpha 6 install running 2.6.27-3-generic (nvidia isn't working on -4 yet)

Revision history for this message
Theodore Ts'o (tytso) wrote :

This is an xfsprogs problem, not an e2fsprogs problem.

Changed in e2fsprogs:
status: New → Invalid
Revision history for this message
Theodore Ts'o (tytso) wrote :

Ah, OK, so the problem seems to be one UUID identification. OK, so this could be a blkid problem --- or the filesystem could just be corrupted enough that the UUID isn't identifying. We need to see the output of "sudo /sbin/blkid", and "ls -l /dev/disk/by-uuid", please.

Changed in e2fsprogs:
status: Invalid → New
Changed in xfsprogs:
status: New → Invalid
Revision history for this message
Richard Eames (naddiseo) wrote :

I did run a xfs_check and xfs_repair on it and still had the problem

Revision history for this message
Richard Eames (naddiseo) wrote :
Revision history for this message
Theodore Ts'o (tytso) wrote :

OK, so blkid is correctly identifying the filesystem. How about the output of this command:

"sudo /sbin/fsck -C3 -R -A -N -V -a"

The -N means to not actually do anything, and -V will print the commands that fsck would have executed. So you should see something like this

# fsck -C3 -R -A -N -V -a
fsck 1.41.1 (01-Sep-2008)
Checking all file systems.
[/sbin/fsck.ext3 (1) -- /boot] fsck.ext3 -a -C3 /dev/sda1
[/sbin/fsck.xfs (1) -- /media/usbd] fsck.xfs -a /dev/mapper/thunk-testext4

Does that work for you? What do you get when you execute that command?

Revision history for this message
Richard Eames (naddiseo) wrote :

$ sudo /sbin/fsck -C3 -R -A -N -V -a
fsck 1.41.0 (10-Jul-2008)
Checking all file systems.
[/sbin/fsck.ext2 (1) -- /boot] fsck.ext2 -a -C3 /dev/sda6
[/sbin/fsck.xfs (2) -- /media/usbd] fsck.xfs -a /dev/sdb1
[/sbin/fsck.ext3 (1) -- /media/music] fsck.ext3 -a -C3 /dev/sda7

Revision history for this message
Theodore Ts'o (tytso) wrote :

OK, so that means that blkid is able to detect device correctly. I'm going to guess that the USB device wasn't visible at the time that /etc/rcS.d/S30checkfs.sh is run. If you can, add to /etc/init.d/checkfs.sh
the following:

logsave -a /var/log/checkfs.debug blkid
logsave -a /var/log/checkfs.debug lsusb
logsave -a /var/log/checkfs.debug cat /proc/partitions

at the beginning of the do_start() bash function.

That will see what devices are available at the time that checkfs is run. If /dev/sdb1 isn't available at the time that checkfs is run, then that's not an e2fsprogs bug, but an issue when the various init.d scripts are getting run in intrepid.

Revision history for this message
Richard Eames (naddiseo) wrote :

Well, still happened this morning, at least it didn't freeze this time, which it appears to do from time to time when asking me to press CTRL+D. I'm not sure when /dev/sdc[12] is coming from, I have nothing other than my USB HDD and my internal HDD attached.

Attached checkfs.debug.

Revision history for this message
Theodore Ts'o (tytso) wrote :

The blkid library keeps old entries in /etc/blkid.tab even if the devices disappear, in the off-chance that it might be useful later. So the entries in /dev/sdc[12] are no big deal. The big indicator is the log from /proc/partitions; that indicates that /dev/sdb1 simply doesn't exist at the time that /etc/init.d/checkfs was run. Why that's the case, I'm not sure. You could try adding a sleep 15 into checkfs to see if it's timing related; maybe it's just taking udev a long time to find the hard drive, since obviously it's there by the time you finish the boot sequence.

In any case, at this point it's not looking like an e2fsprogs bug. Maybe a kernel or initscripts bug; I'm not sure.

Revision history for this message
Richard Eames (naddiseo) wrote :

I added the sleep 15 like you suggested and it booted up fine, though long boot times annoy me. I'll reattach my checkfs.debug.

Revision history for this message
Theodore Ts'o (tytso) wrote :

OK, so if the sleep 15 works, then the problem is that the system isn't waiting until all of the USB buses and devices are enumerated until continuing. This is nominally the job of /etc/init.d/udev, which is run out of /etc/rcS.d as /etc/rcS.d/S10udev. This init script runs "/sbin/udevadm trigger" and "/sbin/udevadm settle" which is supposed to wait for all of the devices nodes to be created before continuing --- and I've wasted many an hour trying to debug udev problems on laptops where it hangs forever on "/sbin/udevadm settle", to the point that I have a very _low_ opinion of udev, its utility, and certainly its robustness. In your case, the problem is "/sbin/udevadm settle" isn't waiting long enough for your USB disk to be created, and so the root filesystem is checked out of /etc/rcS.d/S20checkroot.sh, and the rest of the filesystems are checked out of /etc/rcS.d/S30checkfs.sh --- and so the "sleep 15" in checkfs.sh works around the problem. (Note: I am giving you the sequence of events out of Ubuntu Hardy; I have not tried messing with Ubuntu Interpid, and I don't know if any changes in Upstart vs. sysvinit may have changed the order in which things run.)

However, to be fair --- this is a hard problem for udev to solve, since USB by definition is designed to be hot-pluggable, and so USB devices can appear at any time. Hence, there is no guarantee when the USB hub is done enumerating, and so there isn't a good external event that would tell "/sbin/udevadm settle" that it's OK to continue. I'm not sure, but I'm guessing that it's something probably added a delay to the USB hub initialization sequence (maybe to work around some bug where a hardware device needed some settling time after the hub is powered up), and this caused the USB enumeration code to not notice your USB disk until some number of seconds later. This is only a theory, and I don't know whether or not the delay was added in user space or the kernel --- but one thing is clear; this isn't an e2fsprogs bug, but rather a bug in either the kernel or in udev, or one of udev's supporting scripts.

Revision history for this message
Richard Eames (naddiseo) wrote :

I'm not sure how much has changed in the past week, but I decided to remove that sleep 15 last night, and this morning it booted just fine. So, I guess it's been "fixed" somewhere, though I'm betting it'll crop up again in the future. Thanks for your help again.

Revision history for this message
barberio (barberio) wrote :

Also seeing this problem.

Let's assume that the root issue is that USB isn't done by the time /etc/rcS.d/S30checkfs.sh runs.

I think the solution there is to put in a check that the file systems identified as needing boot time checks in fstab exist, looping on that check for a sensible timeout period. This also fixes any problems with any future block devices which are expected by the system to be there at boot, but are slow to initialise. (for instance, crappy eSATA devices)

Revision history for this message
barberio (barberio) wrote :

Moving to a sysvinit problem.

Revision history for this message
Julien Plissonneau Duquene (julien-plissonneau-duquene) wrote :

Marking as invalid in e2fsprogs. As written above, it is a problem in init scripts and/or udev.

Changed in e2fsprogs (Ubuntu):
status: New → Invalid
Revision history for this message
Micah P. Dombrowski (mpdombrowski) wrote :

I'm having the same problem. Does anyone have a fix along the lines of what barberio suggested?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.