Failure to boot if fstab disk mounts fail

Bug #1463120 reported by Mark Rogers
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
High
Unassigned

Bug Description

I found this on my main 15.04 desktop but have reproduced it in a VM:
1. Install Ubuntu 15.04 (I included updates and non-free but I doubt that matters).
2. Add a bogus entry to /etc/fstab, eg:
  /dev/sdd1 /mnt/sdd1 ext4 errors=remount-ro 0 0
Note that /mnd/sdd1 exists but /dev/sdd1 does not
3. Reboot

Expected: Failure to mount /dev/sdd1 reported and option to boot without mounting it
Actual: System appears to hang, although it will eventually present a root terminal (with no indication of the cause of the problem). It appears to hang prior to switching graphical mode, with the result that any warnings/errors that are present on all boots but invisible because the screen clears before they're seen become visible for the first time incorrectly suggesting they are the cause of the problem.

Removing (or commenting out) the problematic entries and rebooting allows the system to boot.

HOWEVER: The grub boot menu now appears and any pre-existing menu timeout and default action seems to have been lost; it's now necessary to select a boot option on each boot. This may be a separate bug (or feature?).

My actual case: I have external drives permanently connected via USB, however the USB card appears to have failed hence the drives are not accessible. With no clues (and being unfamiliar to systemd) working out why the system wouldn't boot was a tough job.
---
ApportVersion: 2.17.2-0ubuntu1.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: mark 1382 F.... pulseaudio
DistroRelease: Ubuntu 15.04
HibernationDevice: RESUME=UUID=76b7b79b-0cdd-4605-bc58-381fad8fa67f
InstallationDate: Installed on 2015-06-08 (4 days ago)
InstallationMedia: Ubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
Lsusb:
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 002: ID 80ee:0021 VirtualBox USB Tablet
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: innotek GmbH VirtualBox
Package: linux (not installed)
ProcEnviron:
 LANGUAGE=en_GB:en
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.19.0-20-generic root=UUID=53f2fcea-e84c-4736-ac69-f34305a63432 ro quiet splash
ProcVersionSignature: Ubuntu 3.19.0-20.20-generic 3.19.8
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.19.0-20-generic N/A
 linux-backports-modules-3.19.0-20-generic N/A
 linux-firmware 1.143.1
RfKill:

Tags: vivid
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
Uname: Linux 3.19.0-20-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 12/01/2006
dmi.bios.vendor: innotek GmbH
dmi.bios.version: VirtualBox
dmi.board.name: VirtualBox
dmi.board.vendor: Oracle Corporation
dmi.board.version: 1.2
dmi.chassis.type: 1
dmi.chassis.vendor: Oracle Corporation
dmi.modalias: dmi:bvninnotekGmbH:bvrVirtualBox:bd12/01/2006:svninnotekGmbH:pnVirtualBox:pvr1.2:rvnOracleCorporation:rnVirtualBox:rvr1.2:cvnOracleCorporation:ct1:cvr:
dmi.product.name: VirtualBox
dmi.product.version: 1.2
dmi.sys.vendor: innotek GmbH

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1463120/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
Mark Rogers (mark-web) wrote :

As requested, I have now set package to "linux" (ie kernel) due to the point in the boot sequence at which the issue occurs. I'm far from qualified to speculate but I am reasonably sure that this is not a kernel issue but a boot script dependency issue, which was why I didn't specify a package initially. Hopefully someone can point me in the right direction?

affects: ubuntu → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1463120

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.1 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.1-rc7-unstable/

tags: added: kernel-da-key
Revision history for this message
Mark Rogers (mark-web) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected vivid
description: updated
Revision history for this message
Mark Rogers (mark-web) wrote : CRDA.txt

apport information

Revision history for this message
Mark Rogers (mark-web) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Mark Rogers (mark-web) wrote : JournalErrors.txt

apport information

Revision history for this message
Mark Rogers (mark-web) wrote : Lspci.txt

apport information

Revision history for this message
Mark Rogers (mark-web) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Mark Rogers (mark-web) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Mark Rogers (mark-web) wrote : ProcModules.txt

apport information

Revision history for this message
Mark Rogers (mark-web) wrote : UdevDb.txt

apport information

Revision history for this message
Mark Rogers (mark-web) wrote : WifiSyslog.txt

apport information

Revision history for this message
Mark Rogers (mark-web) wrote :

Performed latest set of updates and retested. Couldn't run apport-collect from the terminal immediately after failure as it needed a graphical book in order to authenticate via Firefox, so ran after fixing fstab and rebooting. I don't know if the logs are therefore useful?

I don't recall exactly what sequence of events lead me to the initial issue but it was likely after updates as I rarely reboot my PC otherwise. However all data in this bug report comes from a clean install in a VirtualBox VM, updates installed, then fstab modified to cause boot to fail.

I will now retest with the latest kernel and report back.

Revision history for this message
Mark Rogers (mark-web) wrote :

Latest kernel (4.10-040100rc2-generic):
Reaches "Ubuntu 15.04" splash screen (with the four dots) which the previous kernel didn't, but still fails to boot, eventually dropping me into emergency mode shell as before.

Removing the offending fstab line and continuing to boot (Ctrl-D from shell) returns me to the splash screen for a while before returning me to emergency shell. The error immediately preceding this (which has been present on previous occasions on other kernels too) is:
Error getting authority: Error initializing authority: Could not connect: No such file or directory (g-io-error-quark, 1)

(Error transcribed from VBox screen not copy+pasted, I've tried to get it 100% correct.)

Revision history for this message
Mark Rogers (mark-web) wrote :

Forgot to add: Having fixed fstab, rebooting gets me a grub menu (no timeout so have to select boot option manually), and boots fine from there.

Mark Rogers (mark-web)
tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Cecil Curry (leycec) wrote :
Download full text (8.7 KiB)

I can confirm this affects all Ubuntu >= 15.04 installations, including Ubuntu 16.04 (LTS).

> I am reasonably sure that this is not a kernel issue but a boot script dependency issue...

That's absolutely the case. Your intuition has not led you astray.

> Hopefully someone can point me in the right direction?

To no one's surprise, this is a high-level system.d issue rather than a low-level kernel issue. Ergo Ubuntu >= 15.04, the release at which Ubuntu switched from Upstart to system.d.

This issue is a significant show-stopper, particularly for users on newer UEFI-based systems. Triggering this issue is disturbingly trivial on such systems. Explicitly editing the "/etc/fstab" file with superuser permissions is *NOT* required to trigger this issue. Formatting a single device with the builtin GUI-driven disk utility ("Disks") is all it takes. And when it takes, most users will have no recourse but to format their root filesystem and reinstall.

Here's the use case my better half stumbled into this morning. During Ubuntu installation, users may elect to automount devices to arbitrary mountpoints when selecting a custom partition scheme. When this is done on UEFI-based systems, the resulting "/etc/fstab" entries resemble:

UUID=050e1e34-39e6-4072-a03e-ae0bf90ba13a /home/waluigi/wah! ext4 defaults 0 2

If any such device is subsequently reformatted (e.g., via the stock "Disks" utility), that device's UUID will be arbitrarily changed, invalidating the UUID previously recorded for that device in "/etc/fstab". Everything will superficially appear to behave as expected. On the next reboot, however, the end user will be presented with the now-infamous Purple Screen of Death. No error messages (...human-readable or otherwise) will appear, obscuring the underlying issue.

Attempting to login with "safe mode" fails. Attempting to login to a root terminal succeeds, albeit only after a few dispiriting rounds of graphical corruption and terminal cursors disappearing. If the user successfully navigates this shameful gauntlet of pain *AND* is sufficiently familiar with low-level system administration to navigate the crucible by geek fire that is the CLI, there still exists no explicit indication of the core issue. (Command-line-fu: why have you betrayed me?)

No relevant warnings or errors appear in either "dmesg" or "journalctl" output *OR* in "/var/log" logfiles. The only clear indicator that I could grep across was a single line of "systemctl -a" output showing the status of some ignorable system service unit with a non-human-readable name to be "loaded inactive dead", which seemed dimly suspicious. Further hours of cursing and grepping yielded the final culprit. My precious sanity. I have less of it now.

Non-tech-savvy users confronted with this problem will probably just vomit, format, and reinstall. As a Gentoo-hardened Disciple of the Command-line Faith with fifteen caffeine-addled years experience in Silicon Valley startup monoculture, I feel overly confident in betting that even the most battle-weary code warriors will hand in their Richard M. Stallman fan club memberships when face-planting into this epic fail. Consider escalating this issue's im...

Read more...

Revision history for this message
Dane Mutters (dmutters) wrote :

I work for a major educational system that utilizes Ubuntu Server (14.04, 16.04 LTS) instances on the Amazon Web Services cloud for critical infrastructure. This bug has come up a few times, where an admin has a typo or other error in /etc/fstab, pertaining to a non-OS filesystem (like an EFS data volume), and as a result, the system fails to boot. On a physical machine, this wouldn't be a big deal: press "S" to skip that mountpoint, and keep booting; or if that doesn't work, boot from live media and fix fstab. However, a cloud instance doesn't grant you those options: it hangs forever, and you have no physical keyboard, USB port, or optical drive with which to workaround the problem.

I normally wouldn't suggest a certain priority on a bug, but this is a severe liability for anyone running Ubuntu in a cloud computing environment. Therefore, I must respectfully request that this bug be elevated to "Major" or "Critical" status.

Thank-you.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This v4.10-rc3 kernel is now available. Can you give that version a test:
 http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10-rc3

Also, was there a prior kernel version that did not exhibit this bug? If so, we can perform a kernel bisect to find the offending commit.

Changed in linux (Ubuntu):
importance: Medium → High
Revision history for this message
Dane Mutters (dmutters) wrote :

Thanks for your attention on this bug, Joseph. I'm compiling the new kernel, now, and will let you know as soon as I have a test result.

As far as I know, all 16.04 kernel versions exhibit this problem; but I haven't done any testing on Ubuntu 15.X.

Revision history for this message
Dane Mutters (dmutters) wrote :

(New kernel is still compiling...)

I just did a test with a fully-patched, newly spun-up Ubuntu Server 14.04 AWS instance, and the behavior seems to be present with kernel 3.13.0-107. I can't verify all of the symptoms described by the reporter, because I don't have physical access; but I get "connection refused" when trying to SSH into it, after making an intentionally-bad edit to /etc/fstab.

For completeness, this is the bad edit I made:
/home/ubuntu/test.img /mnt ext4 defaults 0 0

The file referenced was created with:
dd if=/dev/zero of=/home/ubuntu/test.img bs=1M count=20

When attempting to mount this file from the command-line, it produces a "wrong filesystem type" error, as expected.

Revision history for this message
Dane Mutters (dmutters) wrote :

I've tested this with kernel 4.10-rc3 in a local VirtualBox VM (Ubuntu Server 16.04), with the same fstab error as above, and am able to confirm most of the symptoms described by the reporter.

When attempting to 'mount -a', it presents the expected "wrong filesystem type" error, but doesn't render the system unresponsive. Upon reboot, however, it drops into a maintenance console without presenting an error about a failed mount attempt. Consequently, the server would be unreachable for maintenance via SSH.

'journalctl -xb' provides a log that contains an error about the failed mount attempt, but doesn't point to it as the reason for dropping into a maintenance terminal, and it isn't the last thing logged.

After fixing fstab (which requires physical access) and rebooting ('reboot' from the maintenance console), GRUB does still have a countdown, but it starts around 10 seconds, instead of 3 seconds (~3 seconds is the default for Ubuntu Server). Otherwise, at this point, the system is able to boot up automatically, without issue.

After rebooting a second time (from the normal BASH prompt), the GRUB countdown is back to ~3 seconds, and the system boots normally (automatically), from there.

Thanks, again, for helping to resolve this bug.

Revision history for this message
Grant Sadler (gsads) wrote :

In the absence of a fix, would some be gracious enough to let me know of a workaround? I just rebooted a remote server (14.04) and am staring at "ssh... Connection Refused". I suspect it has to do with fstab since I just edited it to mount a USB drive. I can provide more details (if needed) once I drive over and connect directly *stare*. Thanks!

Revision history for this message
Hudson Kendall (plaguedoctor) wrote :

This is a serious issue when there is no physical access to the machine. I can confirm this issue on 16.04 happen after adding a new harddrive to the fstab and subsequently reformatting the drive. At reboot, I have to options to resolve the situation. All network requests are ignored, because the boot sequence can not complete.

Revision history for this message
Dane Mutters (dmutters) wrote :

Could fstab default mount options be changed so that, unless otherwise specified, failed mount attempts would skip that filesystem and keep booting (logging an error), instead of dropping to an inaccessible recovery console? This would let the fstab be fixed with minimal hassle in a headless environment, so long as the broken mount isn't /boot or /.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.