Scheduled fsck during boot unresponsive and inactive for a very long time at 90%, making the system appear to hang

Bug #487744 reported by ais523 on 2009-11-24
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
mountall (Ubuntu)
Low
Unassigned

Bug Description

Binary package hint: e2fsprogs

My ext3 filesystem (originally created by Ubuntu Feisty, with a sequence of upgrades since then) has currently been mounted over 30 times since its last check, thus forcing a fsck on boot. If I skip the fsck with the Esc key, everything continues as normal and the boot completes. However, if I let fsck run, then:

- the fsck continues as normal until it reaches 89%
- the hard drive activity light goes off
- the hard drive makes a sound it normally only makes at system shutdown (although I don't know for certain, it's probably the noise of the hard drive being turned off altogether)
- after several minutes, the fsck progress continues to 90%
- the system then hangs thereafter.

This happens whether I try to boot normally or in recovery mode; in recovery mode, this is what the bottom of the screen looks like when it hangs (retyped by hand from the view on screen, so may possibly contain typos; some of this information is almost certainly unrelated, but I haven't edited the portion of the log in question):

init: bootchart main process (441) terminated with status 1
fsck from util-linux-ng 2.16
udevd[446]: NAME="%k" is superfluous and breaks kernel supplied names, please remove it from /etc/udev/rules.d/60-kqemu.rules:1

WARNING: All config files need .conf: /etc/modprobe.d/kqemu, it will be ignored in a future release
WARNING: All config files need .conf: /etc/modprobe.d/kqemu, it will be ignored in a future release
WARNING: All config files need .conf: /etc/modprobe.d/kqemu, it will be ignored in a future release
WARNING: All config files need .conf: /etc/modprobe.d/kqemu, it will be ignored in a future release
init: bootchart post-stop process (450) terminated with status 1
os_part has been mounted 35 times without being checked, check forced
Filesystem checks are in progress (ESC to cancel):
[######################################################------]

At this point, pressing ESC to cancel the fsck doesn't work; ESC, control-C, and control-backslash do nothing but echo on screen as ^[, ^C, ^\ respectively (they likewise do nothing outside recovery mode, as the echo's aren't visible). Control-alt-delete does work from this state, and causes the system to do a controlled shutdown and reboot; but another fsck happens on the resulting reboot.

Suspiciously, the reboots aborted due to fsck hanging don't appear in syslog at all (either that, or they do and I can't find them); only the boots where I escaped the automatic fsck are shown.

Expected behaviour is for the fsck to complete and for the boot sequence to continue. This drive has fscked correctly with Karmic before; the only potentially relevant change to packages that I can remember is when ureadahead was installed by the update manager (I run karmic-proposed).

Although this bug has similar symptoms to bug #387692, I think it's a different bug; in that bug, the fsck actually completed and the system hung immediately afterwards, whereas in this one, the fsck itself is hanging.

ProblemType: Bug
Architecture: i386
Date: Tue Nov 24 18:02:10 2009
DistroRelease: Ubuntu 9.10
Package: e2fsprogs 1.41.9-1ubuntu3
ProcEnviron:
 SHELL=/bin/bash
 LANG=en_US.UTF-8
ProcVersionSignature: Ubuntu 2.6.31-15.50-generic
SourcePackage: e2fsprogs
Uname: Linux 2.6.31-15-generic i686
XsessionErrors:
 (gnome-settings-daemon:2939): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed
 (gnome-settings-daemon:2939): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed
 (nautilus:3032): Eel-CRITICAL **: eel_preferences_get_boolean: assertion `preferences_is_initialized ()' failed
 (polkit-gnome-authentication-agent-1:3045): GLib-CRITICAL **: g_once_init_leave: assertion `initialization_value != 0' failed

ais523 (ais523) wrote :

I actually don't think you have an fsck at all, which would explain why Escape isn't working

affects: e2fsprogs (Ubuntu) → mountall (Ubuntu)
ais523 (ais523) wrote :

> I actually don't think you have an fsck at all, which would explain why
> Escape isn't working

This can't possibly be the case:
- Escape works just fine before the fsck reaches 89%
- fsck is showing a progress bar; if fsck isn't there, then how would
  the progress bar appear?
- "Filesystem checks are in progress (ESC to cancel):" strongly implies
  that something is trying to run fsck, and if it wasn't there I'd expect it
  to error out immediately
- "which fsck" and "which e2fsck" return /sbin/fsck and /sbin/e2fsck, as
  expected
- the bug reporting tool would have noticed if I tried to report a bug in
  a package I don't have installed

On Wed, 2009-12-02 at 17:02 +0000, ais523 wrote:

> > I actually don't think you have an fsck at all, which would explain why
> > Escape isn't working
>
> This can't possibly be the case:
> - Escape works just fine before the fsck reaches 89%
>
Then stops working. Which is precisely my point.

> - fsck is showing a progress bar; if fsck isn't there, then how would
> the progress bar appear?
>
Actually fsck doesn't show the progress bar, something else is. If fsck
went away, and that something else hadn't noticed, the progress bar
would be still there ... and Escape wouldn't work.

> - "Filesystem checks are in progress (ESC to cancel):" strongly implies
> that something is trying to run fsck, and if it wasn't there I'd expect it
> to error out immediately
>
See above.

> - "which fsck" and "which e2fsck" return /sbin/fsck and /sbin/e2fsck, as
> expected
>
That just means they're installed. I meant that fsck stopped running at
90% (or got to 100% quicker than the progress bar noticed).

Scott
--
Scott James Remnant
<email address hidden>

> > - fsck is showing a progress bar; if fsck isn't there, then how would
> > the progress bar appear?
> >
> Actually fsck doesn't show the progress bar, something else is. If fsck
> went away, and that something else hadn't noticed, the progress bar
> would be still there ... and Escape wouldn't work.

I assumed the progress bar was some variant of fsck -C; in previous versions of Ubuntu, the progress bar showed a lot of detail about what the fsck was doing, although nowadays it's a simple percentage. fsck unexpectedly exiting at 90% would explain about half the symptoms I'm getting, though (although it wouldn't directly explain why the system would lock up rather than continue thereafter).

ais523 (ais523) wrote :

This bug seems to have stopped occuring now (at least, I can no longer reproduce it...) Marking as invalid for the time being, I'll reopen the bug if it happens again. (It was happening repeatably earlier, though, so it seems not to be intermittent; probably it's triggering on some unknown cause.)

Changed in mountall (Ubuntu):
status: New → Invalid
ais523 (ais523) wrote :
Download full text (3.6 KiB)

Finally figured out what's going on here! I'm now on a different computer, and getting the same bug again; but it behaves slightly differently here. This computer has an ext4 filesystem originally created by Ubuntu Karmic, and has a rather smaller main partition (relevant to how I noticed what was going on); but the same thing's happening. (And yes, I'm now suspicious that the blame is mostly mountall's.)

What happens here if I don't press ESC to abort a scheduled fsck is exactly the same; it appears to proceed as normal until 89% (which goes very quickly on this computer, thus making it easier to test), the hard drive activity light goes off, the fsck progress goes to 90% after several minutes, and the system apparently completely hangs thereafter. However, after waiting for another half-hour or so, during which the system is apparently completely hung, the progress bar goes to 91%, and picks up thereafter, with the boot finally completed.

Pressing ESC works differently on this machine, though; instead of aborting the fsck, it switches to tty1, gives an error message ("General error mounting filesystems."), and drops me to a root prompt, with the advice that control-D should retry. After pressing control-D, the fsck resumes not at 0%, but at whatever percentage it was at when I press ESC; it's as if the fsck was not stopped at all, but only suspended (which, if true, would at least explain why the filesystems failed to mount).

As a test, I tried interrupting a scheduled fsck with ESC, restarting it with control-D, interrupting the same FSCK again, restarting it again with control-D, and finally waiting until the fsck completed (including the huge hang at 89-91% for no apparent reason and with no hard drive activity). This is a dump of tty1 obtained after the boot sequence ended (obtained via /dev/vcs1):
{{{
General error mounting filesystems.
A maintenance shell will now be started.
CONTROL-D will terminate this shell and re-try.
root@desert:~# exit
mountall start/starting
fsck from util-linux-ng 2.16
swapon: /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea: swapon failed: Device or resource busy
mountall: swapon /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea [1003] terminated with status 255
mountall: Problem activating swap: /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea
/dev/sda5 has been mounted 25 times without being checked, check forced.
Filesystem checks are in progress (ESC to cancel):
[#######-----------------------------------------------------]
mountall: Cancelled
/dev/sda5: e2fsck canceled.
fsck.ext4: Inode bitmap not loaded while setting block group checksum info
mountall: fsck / [1001] terminated with status 8
mountall: General fsck error
init: mountall main process (1000) terminated with status 1
General error mounting filesystems.
A maintenance shell will now be started.
CONTROL-D will terminate this shell and re-try.
root@desert:~# exit
mountall start/starting
fsck from util-linux-ng 2.16
swapon: /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea: swapon failed: Device or resource busy
mountall: swapon /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea [1042]...

Read more...

Changed in mountall (Ubuntu):
status: Invalid → New
summary: - Scheduled fsck during boot hangs at 90%, preventing boot sequence
- completing
+ Scheduled fsck during boot unresponsive and inactive for a very long
+ time at 90%, making the system appear to hang

The mountall bug here is that it didn't clear the message when the fsck finished (and didn't show another)

Fix pending

Changed in mountall (Ubuntu):
status: New → Fix Committed
importance: Undecided → Low
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mountall - 2.10

---------------
mountall (2.10) lucid; urgency=low

  * Rework the Plymouth connection logic; one needs to attach the client to
    the event loop *after* connection otherwise you don't get disconnection
    notification, and one needs to actually actively disconnect in the
    disconnection handler.
  * For safety and sanity reasons it becomes much simpler to create the
    ply_boot_client when we connect, and free it on disconnection. Thus the
    presence or not of this struct tells us whether we're connected or not.
    LP: #524708.
  * Flush the plymouth connection before closing it and exiting, otherwise
    updates may be pending and the screen have messages that confuse people
    while X is starting (like fsck at 90%). LP: #487744.

  * Replace the modal plymouth prompt for error conditions with code that
    continues working in the background while prompting. This most benefits
    the old "Waiting for" message, which can now allow you to continue to
    wait and it can solve itself. LP: #527666, #545435.
  * Integrate fsck progress updates into the same mechanism.
  * Allow fsck messages to be translated. LP: #390740.
  * Change fsck message to be a little less alarming. LP: #545267.
  * Add hard dependency on Plymouth; without it running, mountall will
    ignore any filesystem which doesn't show up within a few seconds or that
    fails to fsck or mount. If you don't want graphical splash, you simply
    need not install themes.

  * Improve set of messages seen with --verbose, and ensure all visible
    messages are marked for translation. LP: #446592.
  * Reduce priority of failed to mount error for remote filesystems since
    we try again, and this just spams the console. LP: #504224.

  * Keep hold of the dev_t when parsing /proc/self/mountinfo, then after
    mounting /dev (or seeing that it's mounted) create a quick udev rules
    file that adds the /dev/root symlink to this device. LP: #527216.
  * Do not try and update /etc/mtab when it's a symbolic link. LP: #529993.
  * Remove odd -a option from mount calls, probably a C&P error from the
    fsck code long ago. LP: #537135.
  * Wait for Upstart to acknowledge receipt of events, even if we don't
    hang around for them to be handled.
  * Always run through try_mounts() at least once. LP: #537136.
  * Don't keep mountall running if the only remaining unmounted filesystems
  *
 -- Scott James Remnant <email address hidden> Wed, 31 Mar 2010 19:37:31 +0100

Changed in mountall (Ubuntu):
status: Fix Committed → Fix Released
D J Eddyshaw (david-eddyshaw) wrote :

Sometime over the past few days a problem very like this has newly arisen on my system

https://bugs.launchpad.net/ubuntu/+source/sysvinit/+bug/554079

Could this be a further problem with mountall?

Pjotr12345 (computertip) wrote :

The bug is apparently not fixed.... On a fully updated Lucid, my computer hangs during booting because of fsck, which stops at 71 %. Only a hard reboot helps.

Changed in mountall (Ubuntu):
status: Fix Released → Confirmed
Steve Langasek (vorlon) wrote :

This bug was fixed, but there are other bugs that will be tracked in other bug reports.

Changed in mountall (Ubuntu):
status: Confirmed → Fix Released
Paul Pascal (ppascal) wrote :

I was upgrading 9.10 to 10.04 on my Dell Ubuntu XPS 410s. Am having a similar issue. It appears that the upgrade was either interrupted or hung somewhere in between. Screen was black and not responding to anything after I came back from work. A hard reboot got me to the following:

"mount: mounting none on /dev failed: No such device"
"chroot: cannot execute /etc/apparmor/initramfs: No such file or directory"

Then, after a couple of minutes:

a bunch of udevd[2815]: SYSFS{} messages saying: "will be removed in a future udev version, please use ATTR{} to match the event device, or ATTRS{}= to match a parent device, in /etc/udev/rules.d/65-libmtp.rules:87"

Then, this intimidating message: "udevd[2815]: specified user 'usbmux' unknown"

And this final line: "os_part has been mounted 27 times without being checked, check forced."

Whatever that process is, it hangs interminably.

Please help with simple, step-by-step solution.

Thanks so much!

chris_c (c-camacho) wrote :

I do not believe this bug to be fixed as I am also still seeing exactly this behaviour

Changed in mountall (Ubuntu):
status: Fix Released → New
Michael (michaeljt) wrote :

I just saw something similar - a scheduled fsck which proceeded nice and fast up to 70% and then very slowly, with no disk activity and apparently a lot of CPU activity (fan going, laptop warming up). I rebooted at 83% having got tired of waiting, and no fsck was scheduled on the next boot. Pressing "C to cancel" did not work. This is a machine that was originally (re-)installed as Karmic with ext4 and upgraded to Lucid one day before the first beta.

Not surprisingly there was no information about that particular boot in the syslog. Please let me know if I can provide any useful information though.

Patrick Den (pat31) wrote :

I have exactly the same as Michael.
This happens on three completely different computers, with different graphics cards (nvidia, intel, ati).
It does not happen a single time, but every time a scheduled check is run.
But I let it run for more than half an hour, until it finally reaches 100% and continues normally.

Takkat (takkat-nebuk) wrote :

Identical issue as Michael and Patrick on 2 different machines here: a netbook running a fresh install of Lucid UNR, and a desktop system with upgrade to Lucid LTS from Karmic.

Takkat (takkat-nebuk) wrote :

Ah, I see: this is most likely Bug #571707

Michael (michaeljt) wrote :

chris_c: does bug #571707 look like what you are seeing? (And does https://launchpad.net/ubuntu/lucid/+source/mountall/2.15 fix the problem for you?) If so, you might want to set the status of this bug back to "Fix released".

Patrick Den (pat31) wrote :

It's worse. Now it says: "severe error", but there is nothing wrong with my partition or harddisk.
In Synaptic, the 'mountall' is version 2.15. So I assume that this new version has been updated a few days ago.

Martin Erik Werner (arand) wrote :

@Patrick Den:
If the "severe error" message is only seen when you cancel the disk check, this is most likely Bug #582035

Patrick Den (pat31) wrote :

@arend, you are right. The scheduled fsck is now working just fine.

Anders (eddiedog988) on 2014-03-13
Changed in mountall (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers