ssh server doesn't start when irrelevant filesystems are not available

Bug #583542 reported by Jeffrey Baker
38
This bug affects 5 people
Affects Status Importance Assigned to Milestone
openssh (Ubuntu)
Triaged
Medium
Unassigned
Declined for Lucid by Mathias Gug
Declined for Maverick by Mathias Gug

Bug Description

In Lucid, the SSH daemon won't start at boot unless all filesystems listed in fstab can be mounted. This is annoying to the administrator because some fstab entries are irrelevant and/or could be expected to have transient failures. When SSH doesn't start, it's impossible for the admin to do an in-band fix of these filesystems.

Examples of when filesystems might not mount:

Underlying device not attached
NFS server unavailable
iSCSI target unavailable
RAID without a quorum of member devices
Kernel package upgrade disabled certain filesystem modules

And so forth. The line "start on filesystem" should probably be edited to something a bit more robust.

Revision history for this message
Eric Hammond (esh) wrote :

This is especially important for remotely controlled servers which have no console access (e.g., Amazon EC2).

Revision history for this message
Colin Watson (cjwatson) wrote :

I don't believe mountall emits any event that would be suitable for this. The only other plausible one is local-filesystems, whose manual page notes that it may well not cover /usr so it's not suitable for use by the ssh job.

'filesystem' is documented as being appropriate for most normal services, so surely many other services have the same problem? Most notably, rc-sysinit starts on filesystem, so you'll never reach runlevel 2 if that event is never emitted. It seems to me that any change I might make in ssh would tend to make matters worse, not better.

Can't you use the nobootwait option in /etc/fstab to avoid holding up boot for filesystems that aren't needed to get up and running? This is documented in fstab(5).

Revision history for this message
Scott Moser (smoser) wrote :

> 'filesystem' is documented as being appropriate for most normal
> services, so surely many other services have the same problem? Most
> notably, rc-sysinit starts on filesystem, so you'll never reach runlevel
> 2 if that event is never emitted. It seems to me that any change I
> might make in ssh would tend to make matters worse, not better.

I agree that this is likely to affect other services or jobs also. I'm
not aware of any event that would be better.

That said, this is a real issue, the 'nobootwait' may be a suitable
workaround for lucid, but there needs to be some way of starting services
that is reliable. All sorts of things could result in a /etc/fstab that
wasn't perfect (failed disk, '/dev/sdXX' entry rather than UUID= and
changed kernel, ...) . Having ssh not start means a physical touch to the
machine or out of band interface has to be used to service it. In
EC2/UEC, there *is* no out of band interface, or physical touch.

Revision history for this message
Jeffrey Baker (jwbaker) wrote :

This may be out of scope for a bug report, but why not change the way an upstart job describes its start conditions? ssh, for example, could supply a script which checks if /usr is mounted. The script(s) can be run after every upstart job completes, and when all conditions are met the new jobs are started.

In the meantime, I'll check out the nobootwait workaround.

Revision history for this message
Scott Moser (smoser) wrote :

hm... now that i'm reading the man page you directed me at, the
nobootwait and optional flags do seem to solve this issue.

at very least, though, there is an educational problem here. I was unaware of these options as I'm sure several sysadmins or users are.

Scott Moser (smoser)
Changed in openssh (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Tokuko (launchpad-net-tokuko) wrote :

This issue has hit me multiple times now. I'm usually working on Solaris, HP-UX and AIX. All of these simply issue a big loud warning on the console, but try to continue to boot, which I guess is what most administrators (at least I) expect.
As the last entry was 3 years ago - has any decision been reached?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.