Bug #489474 “Need newbie-friendly alternative to maintenance she...” : Bugs : mountall package : Ubuntu

Revision history for this message

Daniel Richard G. (skunk) wrote on 2009-11-28:

#1

The horror! The horror! Edit (105.2 KiB, image/jpeg)

Just for giggles, this is a screenshot taken by my father (with a camera, natch) of the output of the fsck(8) command he typed in. Embarrassing, isn't it?

Revision history for this message

jtniehof (jtniehof) wrote on 2009-11-28:

#2

Thank you for taking the time to make Ubuntu better. Since you are suggesting a significant design effort, you are invited to post your idea in Ubuntu Brainstorm at http://brainstorm.ubuntu.com/ where it can be discussed, voted by the community and reviewed by developers. Thanks for taking the time to share your opinion!

Changed in mountall (Ubuntu):
status:	New → Invalid

Revision history for this message

Daniel Richard G. (skunk) wrote on 2009-11-28:

#3

jtniehof, I am not suggesting a "significant design effort" (please re-read the report) and this bug/wishlist item concerns an existing, basic usability failure rather than posit a new "would be nice to have" idea. If this should be filed against another package, then say so. Otherwise, please allow this issue to be addressed via the normal channels. Thank you.

Changed in mountall (Ubuntu):
status:	Invalid → New

Revision history for this message

Michael Rooney (mrooney) wrote on 2009-12-02:

#4

This sounds like a pretty great and reasonable idea based on the information you've provided, but I'm no expert. I will mark it as Triaged and Wishlist as per https://wiki.ubuntu.com/Bugs/Importance . keybuk is the maintainer of this package, let's see what he thinks.

Changed in mountall (Ubuntu):
importance:	Undecided → Wishlist
status:	New → Triaged

Revision history for this message

Jonathan Marsden (jmarsden) wrote on 2009-12-02:

#5

mrooney:

I'm not convinced that this is ready for implementation:

I think the unanswered questions I can see here include:

(a) How does the system determine whether the issue really was caused simply by a hard shutdown or power loss (and so decide that fsck -y is likely to be an appropriate thing to offer the user as an "automated" fix), vs something more serious
such as a failing disk drive or intermittently faulty disk controller (or any other circumstances in which an fsck -y could do more harm than good)? While a human user often "knows" whether or not the system was "just" not shut down correctly last time, the OS itself has no apparent way to determine that while booting, that I know of. This bug report does not seem to propose a way to determine this.

(b) Are you sure most knowledgeable people really do fsck -y in these circumstances? Is running fsck -y guaranteed to be safe for your data, on all filesystem types on all classes of machine? Is there evidence to support this claim? Even in the report's specific circumstances, I'd at least have made an image copy of the virtual disk file first, so that if e2fsck -y broke it further, I'd have a way to start the repair over, using a more conventional and more careful approach than the one you are advocating
here. Maybe I am just slightly more paranoid than some about my data. So, anecdotally at least, it is unclear that "most experienced users" handle this kind of situation exactly as you do.

(c) Have you considered the implications for filesystems whose fsck does not support a -y flag? Note that the fsck man page specifically states that "Options to different filesystem-specific fsck's are not standardized", and goes on to list -y as one that not all filesystem-specific checkers support. I have not checked all fs-specific checkers in Ubuntu to confirm this. One solution for this could be only offer the proposed "pretty" UI if the fs concerned is ext2/ext3/ext4 ?

Revision history for this message

Michael Rooney (mrooney) wrote on 2009-12-02: Re: [Bug 489474] Re: Need newbie-friendly alternative to maintenance shell when mount fails

#6

On Wed, Dec 2, 2009 at 9:52 AM, Jonathan Marsden <email address hidden> wrote:
> I'm not convinced that this is ready for implementation:

It may not be, I marked it as Triaged because it is at least ready for
a developers review/opinion/thoughts, like you gave :)

Revision history for this message

Daniel Richard G. (skunk) wrote on 2009-12-02:

#7

Download full text (3.7 KiB)

Jonathan, here's my take on these questions:

(a) I wasn't thinking along the lines of the system being "aware" of whether it was properly shut down or not, although I believe the infrastructure is already in place for this to be possible (i.e. "Ubuntu recorded a successful boot" on startup, and how the GRUB menu now pops up automatically on boot if the system was not properly shut down beforehand). I'd rather not have a "repair" option be limited to these circumstances, however, at least not without carefully thinking through when else it might be needed. (Ideally, a root prompt would only come up if the system is hosed in a non-trivial way.)

My thought was more along the lines of being aware why the mount failed. Was the filesystem/journal corrupted? Did the fsck auto-preening fail? Is the kernel giving I/O errors? Was the device just not there? You do things differently, depending on what the failure mode is. (I believe mountall is doing all the fsck'ing in addition to the mounting, so it would be aware of what's going on.)

The real wildcard would be the bad-controller case, where the filesystem on disk is sound but it gets garbled when read, and sensible writes to the disk turn into destructive ones. But is this really a factor? For one, it's not a terribly common failure mode, as far as I'm aware. Two, if fsck would damage the disk, then presumably, so would normal system activity. Three... even if fsck does completely hose the disk in this rare scenario, should that mean that we really can't do any better than drop the user into a root shell on mount failure? It comes down to risk management---getting in a car entails a risk of a fatal traffic accident, but for most people, the benefits outweigh the risks. I think a similar logic would apply here.

(b) There's no guarantee that "fsck -y" is always going to be safe. There's no guarantee that it'll get things back into a usable state; heck, it might make the corruption worse. But of all the more involved filesystem-recovery tools and techniques that you're aware of, how feasible is it for a computer-phobic user to resort to any of them? How feasible is it for said user to take the system to a big-name electronics store, whose techs only know Windows, and have them do it? Or post on ubuntuforums.org, and be walked through the whole process, hands-on? (Remember, my dad didn't even know that root@hostname:~# was a prompt.) Heck, if "fsck -y" doesn't fix the problem, *I* wouldn't know what else to do, and I work professionally with Linux. (For my part, I would just consider the filesystem a total loss, and blow it away. I'd only figure out the more advanced recovery stuff if there were something _really_ important on it.)

I say "fsck -y" instead of just "fsck", too, because... unless I'm _really_ intimate with my filesystems, am I really going to know whether I should fix the block count for group #37, but not #52? Very, very few people are ever going to have a reason to say "n" to any of fsck's prompts.

(By the way, just to correct what seems like a misperception... the damaged filesystem in this case was not one on a VirtualBox virtual disk file, but the Linux root filesystem on the lap...

Jonathan, here's my take on these questions:

(a) I wasn't thinking along the lines of the system being "aware" of whether it was properly shut down or not, although I believe the infrastructure is already in place for this to be possible (i.e. "Ubuntu recorded a successful boot" on startup, and how the GRUB menu now pops up automatically on boot if the system was not properly shut down beforehand). I'd rather not have a "repair" option be limited to these circumstances, however, at least not without carefully thinking through when else it might be needed. (Ideally, a root prompt would only come up if the system is hosed in a non-trivial way.)

My thought was more along the lines of being aware why the mount failed. Was the filesystem/journal corrupted? Did the fsck auto-preening fail? Is the kernel giving I/O errors? Was the device just not there? You do things differently, depending on what the failure mode is. (I believe mountall is doing all the fsck'ing in addition to the mounting, so it would be aware of what's going on.)

The real wildcard would be the bad-controller case, where the filesystem on disk is sound but it gets garbled when read, and sensible writes to the disk turn into destructive ones. But is this really a factor? For one, it's not a terribly common failure mode, as far as I'm aware. Two, if fsck would damage the disk, then presumably, so would normal system activity. Three... even if fsck does completely hose the disk in this rare scenario, should that mean that we really can't do any better than drop the user into a root shell on mount failure? It comes down to risk management---getting in a car entails a risk of a fatal traffic accident, but for most people, the benefits outweigh the risks. I think a similar logic would apply here.

(b) There's no guarantee that "fsck -y" is always going to be safe. There's no guarantee that it'll get things back into a usable state; heck, it might make the corruption worse. But of all the more involved filesystem-recovery tools and techniques that you're aware of, how feasible is it for a computer-phobic user to resort to any of them? How feasible is it for said user to take the system to a big-name electronics store, whose techs only know Windows, and have them do it? Or post on ubuntuforums.org, and be walked through the whole process, hands-on? (Remember, my dad didn't even know that root@hostname:~# was a prompt.) Heck, if "fsck -y" doesn't fix the problem, *I* wouldn't know what else to do, and I work professionally with Linux. (For my part, I would just consider the filesystem a total loss, and blow it away. I'd only figure out the more advanced recovery stuff if there were something _really_ important on it.)

I say "fsck -y" instead of just "fsck", too, because... unless I'm _really_ intimate with my filesystems, am I really going to know whether I should fix the block count for group #37, but not #52? Very, very few people are ever going to have a reason to say "n" to any of fsck's prompts.

(By the way, just to correct what seems like a misperception... the damaged filesystem in this case was not one on a VirtualBox virtual disk file, but the Linux root filesystem on the laptop's physical hard drive. I don't disagree with the wisdom of making an image-copy of the filesystem before attempting repair, and then using a more conventional approach if that fails---the problem is, in this kind of situation, that's not feasible. In practical terms, it's no better than declaring the filesystem a total loss.)

(c) I wasn't aware that -y wasn't universally supported. But yeah, an auto-repair option can be limited to certain filesystem types. If someone installs Ubuntu on a MINIX filesystem, then, well, they're on their own :-3

Revision history for this message

danfish (dan-fishms) wrote on 2009-12-02: Re: [Bug 489474] Re: Need newbie-friendly alternative to maintenance shell when mount fails

#8

Download full text (4.1 KiB)

Sorry,

I'm very new to bug triaging so I'm not sure of the etiquette and if
this is a place to reply, but I'm experienced the same problem for a
relative of mine and it makes debugging/support a problem. On an aside,
I wonder if there is a place for on lauchpad/forums for 'supporting
family members' etc.

Dan Fish
> Jonathan, here's my take on these questions:
>
> (a) I wasn't thinking along the lines of the system being "aware" of
> whether it was properly shut down or not, although I believe the
> infrastructure is already in place for this to be possible (i.e. "Ubuntu
> recorded a successful boot" on startup, and how the GRUB menu now pops
> up automatically on boot if the system was not properly shut down
> beforehand). I'd rather not have a "repair" option be limited to these
> circumstances, however, at least not without carefully thinking through
> when else it might be needed. (Ideally, a root prompt would only come up
> if the system is hosed in a non-trivial way.)
>
> My thought was more along the lines of being aware why the mount failed.
> Was the filesystem/journal corrupted? Did the fsck auto-preening fail?
> Is the kernel giving I/O errors? Was the device just not there? You do
> things differently, depending on what the failure mode is. (I believe
> mountall is doing all the fsck'ing in addition to the mounting, so it
> would be aware of what's going on.)
>
> The real wildcard would be the bad-controller case, where the filesystem
> on disk is sound but it gets garbled when read, and sensible writes to
> the disk turn into destructive ones. But is this really a factor? For
> one, it's not a terribly common failure mode, as far as I'm aware. Two,
> if fsck would damage the disk, then presumably, so would normal system
> activity. Three... even if fsck does completely hose the disk in this
> rare scenario, should that mean that we really can't do any better than
> drop the user into a root shell on mount failure? It comes down to risk
> management---getting in a car entails a risk of a fatal traffic
> accident, but for most people, the benefits outweigh the risks. I think
> a similar logic would apply here.
>
> (b) There's no guarantee that "fsck -y" is always going to be safe.
> There's no guarantee that it'll get things back into a usable state;
> heck, it might make the corruption worse. But of all the more involved
> filesystem-recovery tools and techniques that you're aware of, how
> feasible is it for a computer-phobic user to resort to any of them? How
> feasible is it for said user to take the system to a big-name
> electronics store, whose techs only know Windows, and have them do it?
> Or post on ubuntuforums.org, and be walked through the whole process,
> hands-on? (Remember, my dad didn't even know that root@hostname:~# was a
> prompt.) Heck, if "fsck -y" doesn't fix the problem, *I* wouldn't know
> what else to do, and I work professionally with Linux. (For my part, I
> would just consider the filesystem a total loss, and blow it away. I'd
> only figure out the more advanced recovery stuff if there were something
> _really_ important on it.)
>
> I say "fsck -y" instead of just "fsck", too, because... unless I'm
> ...

Sorry,

I'm very new to bug triaging so I'm not sure of the etiquette and if
this is a place to reply, but I'm experienced the same problem for a
relative of mine and it makes debugging/support  a problem. On an aside,
I wonder if there is a place for on lauchpad/forums for 'supporting
family members' etc.

Dan Fish
> Jonathan, here's my take on these questions:
>
> (a) I wasn't thinking along the lines of the system being "aware" of
> whether it was properly shut down or not, although I believe the
> infrastructure is already in place for this to be possible (i.e. "Ubuntu
> recorded a successful boot" on startup, and how the GRUB menu now pops
> up automatically on boot if the system was not properly shut down
> beforehand). I'd rather not have a "repair" option be limited to these
> circumstances, however, at least not without carefully thinking through
> when else it might be needed. (Ideally, a root prompt would only come up
> if the system is hosed in a non-trivial way.)
>
> My thought was more along the lines of being aware why the mount failed.
> Was the filesystem/journal corrupted? Did the fsck auto-preening fail?
> Is the kernel giving I/O errors? Was the device just not there? You do
> things differently, depending on what the failure mode is. (I believe
> mountall is doing all the fsck'ing in addition to the mounting, so it
> would be aware of what's going on.)
>
> The real wildcard would be the bad-controller case, where the filesystem
> on disk is sound but it gets garbled when read, and sensible writes to
> the disk turn into destructive ones. But is this really a factor? For
> one, it's not a terribly common failure mode, as far as I'm aware. Two,
> if fsck would damage the disk, then presumably, so would normal system
> activity. Three... even if fsck does completely hose the disk in this
> rare scenario, should that mean that we really can't do any better than
> drop the user into a root shell on mount failure? It comes down to risk
> management---getting in a car entails a risk of a fatal traffic
> accident, but for most people, the benefits outweigh the risks. I think
> a similar logic would apply here.
>
> (b) There's no guarantee that "fsck -y" is always going to be safe.
> There's no guarantee that it'll get things back into a usable state;
> heck, it might make the corruption worse. But of all the more involved
> filesystem-recovery tools and techniques that you're aware of, how
> feasible is it for a computer-phobic user to resort to any of them? How
> feasible is it for said user to take the system to a big-name
> electronics store, whose techs only know Windows, and have them do it?
> Or post on ubuntuforums.org, and be walked through the whole process,
> hands-on? (Remember, my dad didn't even know that root@hostname:~# was a
> prompt.) Heck, if "fsck -y" doesn't fix the problem, *I* wouldn't know
> what else to do, and I work professionally with Linux. (For my part, I
> would just consider the filesystem a total loss, and blow it away. I'd
> only figure out the more advanced recovery stuff if there were something
> _really_ important on it.)
>
> I say "fsck -y" instead of just "fsck", too, because... unless I'm
> _really_ intimate with my filesystems, am I really going to know whether
> I should fix the block count for group #37, but not #52? Very, very few
> people are ever going to have a reason to say "n" to any of fsck's
> prompts.
>
> (By the way, just to correct what seems like a misperception... the
> damaged filesystem in this case was not one on a VirtualBox virtual disk
> file, but the Linux root filesystem on the laptop's physical hard drive.
> I don't disagree with the wisdom of making an image-copy of the
> filesystem before attempting repair, and then using a more conventional
> approach if that fails---the problem is, in this kind of situation,
> that's not feasible. In practical terms, it's no better than declaring
> the filesystem a total loss.)
>
> (c) I wasn't aware that -y wasn't universally supported. But yeah, an
> auto-repair option can be limited to certain filesystem types. If
> someone installs Ubuntu on a MINIX filesystem, then, well, they're on
> their own :-3
>
>

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-12-04:

#9

We actually have a spec for this: https://blueprints.launchpad.net/ubuntu/+spec/foundations-lucid-boot-recovery

It's relatively simple, but if anyone else wanted to help make it better, that'd be welcome

Revision history for this message

Michael Rooney (mrooney) wrote on 2009-12-04:

#10

Thanks Scott, I've linked the blueprint!

Revision history for this message

Daniel Richard G. (skunk) wrote on 2009-12-04:

#11

The blueprint looks good, pretty ambitious. Integrating with Plymouth will make it about as slick as it can get.

I would tweak the spec by noting that the root filesystem is a special case (can't ignore that one, not least since the usual Ubuntu setup puts everything on one partition), so it has to be handled differently. Maybe add some text blurbs for different kinds of partitions (e.g. if it's /home that fails to mount, tell the user "your personal files may not be available," etc.).

Scott James Remnant (Canonical) (canonical-scott) on 2009-12-21

Changed in mountall (Ubuntu):
status:	Triaged → Fix Committed
assignee:	nobody → Scott James Remnant (scott)
milestone:	none → lucid-alpha-2

Revision history for this message

Launchpad Janitor (janitor) wrote on 2009-12-21:

#12

This bug was fixed in the package mountall - 2.0

---------------
mountall (2.0) lucid; urgency=low

  [ Scott James Remnant ]
  * "mount" event changed to "mounting", to make it clear it happens
    before the filesystem is mounted. Added "mounted" event which
    happens afterwards.
  * Dropped the internal hooks, these are now better handled by Upstart
    jobs on the "mounted" event.
  * Dropped the call to restorecon for tmpfs filesystems, this can also be
    handled by an Upstart job supplied by SELinux now.
    - mounted-dev.conf replaces /dev hook, uses MAKEDEV to make devices.
    - mounted-varrun.conf replaces /var/run hook
    - mounted-tmp.conf replaces /tmp hook.
      + Hook will be run for any /tmp mountpoint. LP: #478392.
      + Switching back to using "find" fixes $TMPTIME to be in days again,
        rathern than hours. LP: #482602
  * Try and make mountpoints, though we only care about failure if the
    mountpoint is marked "optional" since otherwise the filesystem might
    make the mountpoint or something.
  * Rather than hiding the built-in mountpoints inside the code, put them
    in a new /lib/init/fstab file; that way users can copy the lines into
    /etc/fstab if they wish to override them in some interesting way.
  * Now supports multiple filesystem types listed in fstab, the whole
    comma-separated list is passed to mount and then /proc/self/mountinfo
    is reparsed to find out what mount actually did.
    * /dev will be mounted as a devtmpfs filesystem if supported by the
      kernel (which then does not need to run the /dev hook script).
  * Filesystem checks may be forced by adding force-fsck to the kernel
    command-line.
  * Exit gracefully with an error on failed system calls, don't infinite
    loop over them. LP: #469985.
  * Use plymouth for all user communication, replacing existing usplash and
    console code;
    * When plymouth is running, rather than exiting on failures, prompt the
      user as to whether to fix the problem (if possible), ignore the problem,
      ignore the mountpoint or drop to a maintenance shell. LP: #489474.
    * If plymouth is not running for whatever reason, the fallback action
      is always to start the recovery shell.
  * Adjust the set of filesystems that we wait for by default: LP: #484234.
    * Wait for all local filesystems, except those marked with the
      "nobootwait" option.
    * Wait for remote filesystems mounted as, or under, /usr or /var, and
      those marked with the "bootwait" option.
  * Always try network mount points, since we allow them to fail silently;
    SIGUSR1 now simply retries them once more. LP: #470776.
  * Don't retry devices repeatedly. LP: #480564.
  * Added manual pages for the events emitted by this tool.

  [ Johan Kiviniemi ]
  * Start all fsck instances in parallel, but set their priorities so that
    thrashing is avoided. LP: #491389.
-- Scott James Remnant <email address hidden> Mon, 21 Dec 2009 23:09:23 +0000

This bug was fixed in the package mountall - 2.0

---------------
mountall (2.0) lucid; urgency=low

[ Scott James Remnant ]
  * "mount" event changed to "mounting", to make it clear it happens
    before the filesystem is mounted.  Added "mounted" event which
    happens afterwards.
  * Dropped the internal hooks, these are now better handled by Upstart
    jobs on the "mounted" event.
  * Dropped the call to restorecon for tmpfs filesystems, this can also be
    handled by an Upstart job supplied by SELinux now.
    - mounted-dev.conf replaces /dev hook, uses MAKEDEV to make devices.
    - mounted-varrun.conf replaces /var/run hook
    - mounted-tmp.conf replaces /tmp hook.
      + Hook will be run for any /tmp mountpoint.  LP: #478392.
      + Switching back to using "find" fixes $TMPTIME to be in days again,
        rathern than hours.  LP: #482602
  * Try and make mountpoints, though we only care about failure if the
    mountpoint is marked "optional" since otherwise the filesystem might
    make the mountpoint or something.
  * Rather than hiding the built-in mountpoints inside the code, put them
    in a new /lib/init/fstab file; that way users can copy the lines into
    /etc/fstab if they wish to override them in some interesting way.
  * Now supports multiple filesystem types listed in fstab, the whole
    comma-separated list is passed to mount and then /proc/self/mountinfo
    is reparsed to find out what mount actually did.
    * /dev will be mounted as a devtmpfs filesystem if supported by the
      kernel (which then does not need to run the /dev hook script).
  * Filesystem checks may be forced by adding force-fsck to the kernel
    command-line.
  * Exit gracefully with an error on failed system calls, don't infinite
    loop over them.  LP: #469985.
  * Use plymouth for all user communication, replacing existing usplash and
    console code;
    * When plymouth is running, rather than exiting on failures, prompt the
      user as to whether to fix the problem (if possible), ignore the problem,
      ignore the mountpoint or drop to a maintenance shell.  LP: #489474.
    * If plymouth is not running for whatever reason, the fallback action
      is always to start the recovery shell.
  * Adjust the set of filesystems that we wait for by default: LP: #484234.
    * Wait for all local filesystems, except those marked with the
      "nobootwait" option.
    * Wait for remote filesystems mounted as, or under, /usr or /var, and
      those marked with the "bootwait" option.
  * Always try network mount points, since we allow them to fail silently;
    SIGUSR1 now simply retries them once more.  LP: #470776.
  * Don't retry devices repeatedly.  LP: #480564.
  * Added manual pages for the events emitted by this tool.

[ Johan Kiviniemi ]
  * Start all fsck instances in parallel, but set their priorities so that
    thrashing is avoided.  LP: #491389.
 -- Scott James Remnant <scott@ubuntu.com>   Mon, 21 Dec 2009 23:09:23 +0000

Changed in mountall (Ubuntu):
status:	Fix Committed → Fix Released

Revision history for this message

Daniel Richard G. (skunk) wrote on 2010-01-10:

#13

Just as a postscript, an arguable "workaround" to this issue in Karmic, and likely helpful tweak to anyone who can't do much with a system if sshd isn't running: set FSCKFIX=yes in /etc/default/rcS.

From the rcS(5) man page:

"""FSCKFIX - When the root and all other file systems are checked, fsck is invoked with the -a option which means "autorepair". If there are major inconsistencies then the fsck process will bail out. The system will print a message asking the administrator to repair the file system manually and will present a root shell prompt (actually a sulogin prompt) on the console. Setting this option to yes causes the fsck commands to be run with the -y option instead of the -a option. This will tell fsck always to repair the file systems without asking for permission."""

mountall(8) will pick up on this---see /etc/init/mountall.conf. Useful if you'd rather have no user intervention (whether through a friendly UI or otherwise) when filesystem corruption issues crop up. Perfect for my dad's laptop!

(If anyone sees inaccuracies in the above, please let me know.)

Ubuntu
mountall package

Need newbie-friendly alternative to maintenance shell when mount fails

Bug Description

Duplicates of this bug

Other bug subscribers

Related blueprints

Bug attachments

Remote bug watches

Ubuntumountall package

Need newbie-friendly alternative to maintenance shell when mount fails

Bug Description

Duplicates of this bug

Other bug subscribers

Related blueprints

Bug attachments

Remote bug watches

Ubuntu
mountall package