Need newbie-friendly alternative to maintenance shell when mount fails

Bug #489474 reported by Daniel Richard G.
24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
mountall (Ubuntu)
Fix Released
Wishlist
Scott James Remnant (Canonical)

Bug Description

Binary package hint: mountall

A month or so ago, I installed Ubuntu Karmic on my father's laptop in Florida. I then flew back home to Boston.

He was running Windows in VirtualBox, and closed the laptop lid (i.e. ACPI suspend). When he opened it, the system did not come up properly. When he rebooted, the root filesystem failed to mount due to "unexpected inconsistencies," and the boot process halted with the following:

    Mount of filesystem failed.
    A maintenance shell will now be started.
    CONTROL-D will terminate this shell and re-try.
    root@hostname:~#

I spent the better part of a year talking him into trying Ubuntu on his laptop. Even when the original Windows XP system took fifteen minutes to boot up because of all the {spy,mal}ware, he hesitated. VirtualBox was what finally sealed the deal. The above root prompt, staring him in the face while I was some 1500 miles away, nearly got him looking at Windows 7 prices.

I had him type in "fsck -v -y /dev/sda2" yadda yadda. Corrupted inodes were fixed, and in short order, the system was booting again normally. Which left me asking, why on Earth did the system not do this for us? Why did it give a computer-phobic user an unsolicited root prompt, when "fsck -y" is what most experienced users would do at this point anyway? I mean sure, let's not prevent the user from running debugfs(8) and examining inodes if s/he wants, but can we do better for the kind of user to whom I have to explain that the "root@hostname" bit is an input prompt?

Here is an example of something that would be just a little more newbie-friendly:

    Mount of filesystem failed.
    Press F1 to attempt repair, F5 to start a maintenance shell (advanced users only):

where F1 runs "fsck [-v] -y" on the filesystem. Advanced users can still do whatever they want, and my dad gets more than a snowball's chance in Hell of fixing the problem on his own.

(By the way, it would be helpful if the error message indicated _which_ filesystem could not be mounted. I hadn't been entirely clear at first on whether it was the root or /home partition that was acting up.)

Revision history for this message
Daniel Richard G. (skunk) wrote :

Just for giggles, this is a screenshot taken by my father (with a camera, natch) of the output of the fsck(8) command he typed in. Embarrassing, isn't it?

Revision history for this message
jtniehof (jtniehof) wrote :

Thank you for taking the time to make Ubuntu better. Since you are suggesting a significant design effort, you are invited to post your idea in Ubuntu Brainstorm at http://brainstorm.ubuntu.com/ where it can be discussed, voted by the community and reviewed by developers. Thanks for taking the time to share your opinion!

Changed in mountall (Ubuntu):
status: New → Invalid
Revision history for this message
Daniel Richard G. (skunk) wrote :

jtniehof, I am not suggesting a "significant design effort" (please re-read the report) and this bug/wishlist item concerns an existing, basic usability failure rather than posit a new "would be nice to have" idea. If this should be filed against another package, then say so. Otherwise, please allow this issue to be addressed via the normal channels. Thank you.

Changed in mountall (Ubuntu):
status: Invalid → New
Revision history for this message
Michael Rooney (mrooney) wrote :

This sounds like a pretty great and reasonable idea based on the information you've provided, but I'm no expert. I will mark it as Triaged and Wishlist as per https://wiki.ubuntu.com/Bugs/Importance . keybuk is the maintainer of this package, let's see what he thinks.

Changed in mountall (Ubuntu):
importance: Undecided → Wishlist
status: New → Triaged
Revision history for this message
Jonathan Marsden (jmarsden) wrote :

mrooney:

I'm not convinced that this is ready for implementation:

I think the unanswered questions I can see here include:

(a) How does the system determine whether the issue really was caused simply by a hard shutdown or power loss (and so decide that fsck -y is likely to be an appropriate thing to offer the user as an "automated" fix), vs something more serious
such as a failing disk drive or intermittently faulty disk controller (or any other circumstances in which an fsck -y could do more harm than good)? While a human user often "knows" whether or not the system was "just" not shut down correctly last time, the OS itself has no apparent way to determine that while booting, that I know of. This bug report does not seem to propose a way to determine this.

(b) Are you sure most knowledgeable people really do fsck -y in these circumstances? Is running fsck -y guaranteed to be safe for your data, on all filesystem types on all classes of machine? Is there evidence to support this claim? Even in the report's specific circumstances, I'd at least have made an image copy of the virtual disk file first, so that if e2fsck -y broke it further, I'd have a way to start the repair over, using a more conventional and more careful approach than the one you are advocating
here. Maybe I am just slightly more paranoid than some about my data. So, anecdotally at least, it is unclear that "most experienced users" handle this kind of situation exactly as you do.

(c) Have you considered the implications for filesystems whose fsck does not support a -y flag? Note that the fsck man page specifically states that "Options to different filesystem-specific fsck's are not standardized", and goes on to list -y as one that not all filesystem-specific checkers support. I have not checked all fs-specific checkers in Ubuntu to confirm this. One solution for this could be only offer the proposed "pretty" UI if the fs concerned is ext2/ext3/ext4 ?

Revision history for this message
Michael Rooney (mrooney) wrote : Re: [Bug 489474] Re: Need newbie-friendly alternative to maintenance shell when mount fails

On Wed, Dec 2, 2009 at 9:52 AM, Jonathan Marsden <email address hidden> wrote:
> I'm not convinced that this is ready for implementation:

It may not be, I marked it as Triaged because it is at least ready for
a developers review/opinion/thoughts, like you gave :)

Revision history for this message
Daniel Richard G. (skunk) wrote :
Download full text (3.7 KiB)

Jonathan, here's my take on these questions:

(a) I wasn't thinking along the lines of the system being "aware" of whether it was properly shut down or not, although I believe the infrastructure is already in place for this to be possible (i.e. "Ubuntu recorded a successful boot" on startup, and how the GRUB menu now pops up automatically on boot if the system was not properly shut down beforehand). I'd rather not have a "repair" option be limited to these circumstances, however, at least not without carefully thinking through when else it might be needed. (Ideally, a root prompt would only come up if the system is hosed in a non-trivial way.)

My thought was more along the lines of being aware why the mount failed. Was the filesystem/journal corrupted? Did the fsck auto-preening fail? Is the kernel giving I/O errors? Was the device just not there? You do things differently, depending on what the failure mode is. (I believe mountall is doing all the fsck'ing in addition to the mounting, so it would be aware of what's going on.)

The real wildcard would be the bad-controller case, where the filesystem on disk is sound but it gets garbled when read, and sensible writes to the disk turn into destructive ones. But is this really a factor? For one, it's not a terribly common failure mode, as far as I'm aware. Two, if fsck would damage the disk, then presumably, so would normal system activity. Three... even if fsck does completely hose the disk in this rare scenario, should that mean that we really can't do any better than drop the user into a root shell on mount failure? It comes down to risk management---getting in a car entails a risk of a fatal traffic accident, but for most people, the benefits outweigh the risks. I think a similar logic would apply here.

(b) There's no guarantee that "fsck -y" is always going to be safe. There's no guarantee that it'll get things back into a usable state; heck, it might make the corruption worse. But of all the more involved filesystem-recovery tools and techniques that you're aware of, how feasible is it for a computer-phobic user to resort to any of them? How feasible is it for said user to take the system to a big-name electronics store, whose techs only know Windows, and have them do it? Or post on ubuntuforums.org, and be walked through the whole process, hands-on? (Remember, my dad didn't even know that root@hostname:~# was a prompt.) Heck, if "fsck -y" doesn't fix the problem, *I* wouldn't know what else to do, and I work professionally with Linux. (For my part, I would just consider the filesystem a total loss, and blow it away. I'd only figure out the more advanced recovery stuff if there were something _really_ important on it.)

I say "fsck -y" instead of just "fsck", too, because... unless I'm _really_ intimate with my filesystems, am I really going to know whether I should fix the block count for group #37, but not #52? Very, very few people are ever going to have a reason to say "n" to any of fsck's prompts.

(By the way, just to correct what seems like a misperception... the damaged filesystem in this case was not one on a VirtualBox virtual disk file, but the Linux root filesystem on the lap...

Read more...

Revision history for this message
danfish (dan-fishms) wrote : Re: [Bug 489474] Re: Need newbie-friendly alternative to maintenance shell when mount fails
Download full text (4.1 KiB)

Sorry,

I'm very new to bug triaging so I'm not sure of the etiquette and if
this is a place to reply, but I'm experienced the same problem for a
relative of mine and it makes debugging/support a problem. On an aside,
I wonder if there is a place for on lauchpad/forums for 'supporting
family members' etc.

Dan Fish
> Jonathan, here's my take on these questions:
>
> (a) I wasn't thinking along the lines of the system being "aware" of
> whether it was properly shut down or not, although I believe the
> infrastructure is already in place for this to be possible (i.e. "Ubuntu
> recorded a successful boot" on startup, and how the GRUB menu now pops
> up automatically on boot if the system was not properly shut down
> beforehand). I'd rather not have a "repair" option be limited to these
> circumstances, however, at least not without carefully thinking through
> when else it might be needed. (Ideally, a root prompt would only come up
> if the system is hosed in a non-trivial way.)
>
> My thought was more along the lines of being aware why the mount failed.
> Was the filesystem/journal corrupted? Did the fsck auto-preening fail?
> Is the kernel giving I/O errors? Was the device just not there? You do
> things differently, depending on what the failure mode is. (I believe
> mountall is doing all the fsck'ing in addition to the mounting, so it
> would be aware of what's going on.)
>
> The real wildcard would be the bad-controller case, where the filesystem
> on disk is sound but it gets garbled when read, and sensible writes to
> the disk turn into destructive ones. But is this really a factor? For
> one, it's not a terribly common failure mode, as far as I'm aware. Two,
> if fsck would damage the disk, then presumably, so would normal system
> activity. Three... even if fsck does completely hose the disk in this
> rare scenario, should that mean that we really can't do any better than
> drop the user into a root shell on mount failure? It comes down to risk
> management---getting in a car entails a risk of a fatal traffic
> accident, but for most people, the benefits outweigh the risks. I think
> a similar logic would apply here.
>
> (b) There's no guarantee that "fsck -y" is always going to be safe.
> There's no guarantee that it'll get things back into a usable state;
> heck, it might make the corruption worse. But of all the more involved
> filesystem-recovery tools and techniques that you're aware of, how
> feasible is it for a computer-phobic user to resort to any of them? How
> feasible is it for said user to take the system to a big-name
> electronics store, whose techs only know Windows, and have them do it?
> Or post on ubuntuforums.org, and be walked through the whole process,
> hands-on? (Remember, my dad didn't even know that root@hostname:~# was a
> prompt.) Heck, if "fsck -y" doesn't fix the problem, *I* wouldn't know
> what else to do, and I work professionally with Linux. (For my part, I
> would just consider the filesystem a total loss, and blow it away. I'd
> only figure out the more advanced recovery stuff if there were something
> _really_ important on it.)
>
> I say "fsck -y" instead of just "fsck", too, because... unless I'm
> ...

Read more...

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

We actually have a spec for this: https://blueprints.launchpad.net/ubuntu/+spec/foundations-lucid-boot-recovery

It's relatively simple, but if anyone else wanted to help make it better, that'd be welcome

Revision history for this message
Michael Rooney (mrooney) wrote :

Thanks Scott, I've linked the blueprint!

Revision history for this message
Daniel Richard G. (skunk) wrote :

The blueprint looks good, pretty ambitious. Integrating with Plymouth will make it about as slick as it can get.

I would tweak the spec by noting that the root filesystem is a special case (can't ignore that one, not least since the usual Ubuntu setup puts everything on one partition), so it has to be handled differently. Maybe add some text blurbs for different kinds of partitions (e.g. if it's /home that fails to mount, tell the user "your personal files may not be available," etc.).

Changed in mountall (Ubuntu):
status: Triaged → Fix Committed
assignee: nobody → Scott James Remnant (scott)
milestone: none → lucid-alpha-2
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mountall - 2.0

---------------
mountall (2.0) lucid; urgency=low

  [ Scott James Remnant ]
  * "mount" event changed to "mounting", to make it clear it happens
    before the filesystem is mounted. Added "mounted" event which
    happens afterwards.
  * Dropped the internal hooks, these are now better handled by Upstart
    jobs on the "mounted" event.
  * Dropped the call to restorecon for tmpfs filesystems, this can also be
    handled by an Upstart job supplied by SELinux now.
    - mounted-dev.conf replaces /dev hook, uses MAKEDEV to make devices.
    - mounted-varrun.conf replaces /var/run hook
    - mounted-tmp.conf replaces /tmp hook.
      + Hook will be run for any /tmp mountpoint. LP: #478392.
      + Switching back to using "find" fixes $TMPTIME to be in days again,
        rathern than hours. LP: #482602
  * Try and make mountpoints, though we only care about failure if the
    mountpoint is marked "optional" since otherwise the filesystem might
    make the mountpoint or something.
  * Rather than hiding the built-in mountpoints inside the code, put them
    in a new /lib/init/fstab file; that way users can copy the lines into
    /etc/fstab if they wish to override them in some interesting way.
  * Now supports multiple filesystem types listed in fstab, the whole
    comma-separated list is passed to mount and then /proc/self/mountinfo
    is reparsed to find out what mount actually did.
    * /dev will be mounted as a devtmpfs filesystem if supported by the
      kernel (which then does not need to run the /dev hook script).
  * Filesystem checks may be forced by adding force-fsck to the kernel
    command-line.
  * Exit gracefully with an error on failed system calls, don't infinite
    loop over them. LP: #469985.
  * Use plymouth for all user communication, replacing existing usplash and
    console code;
    * When plymouth is running, rather than exiting on failures, prompt the
      user as to whether to fix the problem (if possible), ignore the problem,
      ignore the mountpoint or drop to a maintenance shell. LP: #489474.
    * If plymouth is not running for whatever reason, the fallback action
      is always to start the recovery shell.
  * Adjust the set of filesystems that we wait for by default: LP: #484234.
    * Wait for all local filesystems, except those marked with the
      "nobootwait" option.
    * Wait for remote filesystems mounted as, or under, /usr or /var, and
      those marked with the "bootwait" option.
  * Always try network mount points, since we allow them to fail silently;
    SIGUSR1 now simply retries them once more. LP: #470776.
  * Don't retry devices repeatedly. LP: #480564.
  * Added manual pages for the events emitted by this tool.

  [ Johan Kiviniemi ]
  * Start all fsck instances in parallel, but set their priorities so that
    thrashing is avoided. LP: #491389.
 -- Scott James Remnant <email address hidden> Mon, 21 Dec 2009 23:09:23 +0000

Changed in mountall (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Daniel Richard G. (skunk) wrote :

Just as a postscript, an arguable "workaround" to this issue in Karmic, and likely helpful tweak to anyone who can't do much with a system if sshd isn't running: set FSCKFIX=yes in /etc/default/rcS.

From the rcS(5) man page:

"""FSCKFIX - When the root and all other file systems are checked, fsck is invoked with the -a option which means "autorepair". If there are major inconsistencies then the fsck process will bail out. The system will print a message asking the administrator to repair the file system manually and will present a root shell prompt (actually a sulogin prompt) on the console. Setting this option to yes causes the fsck commands to be run with the -y option instead of the -a option. This will tell fsck always to repair the file systems without asking for permission."""

mountall(8) will pick up on this---see /etc/init/mountall.conf. Useful if you'd rather have no user intervention (whether through a friendly UI or otherwise) when filesystem corruption issues crop up. Perfect for my dad's laptop!

(If anyone sees inaccuracies in the above, please let me know.)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.