Dependency loops due to ANDed start conditions leave system unbootable

Bug #964207 reported by Nikolaus Rath
16
This bug affects 6 people
Affects Status Importance Assigned to Milestone
mountall (Ubuntu)
Invalid
Undecided
Unassigned
upstart (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

If /home is mounted on a separate partition, and the gdm start condition in /etc/init/gdm.conf is modified to include "mounted MOUNTPOINT=/home" as follows:

start on (filesystem
          and mounted MOUNTPOINT=/home
          and started dbus
          and (drm-device-added card0 PRIMARY_DEVICE_FOR_DISPLAY=1
               or stopped udevtrigger))

then a lucid system will no longer boot. It seems that in this case the mountall process is waiting for input from upstart, but upstart is not sending anything. Thus, the required muntall events are not emitted and the system refuses to boot.

The problem can be worked around by manually starting another mountall instance while the first instance is hanging.

I have attached the --verbose output of the first and second mountall, as
well as an strace output.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: mountall 2.15.3
ProcVersionSignature: Ubuntu 3.0.0-17.30~lucid1-server 3.0.22
Uname: Linux 3.0.0-17-server x86_64
Architecture: amd64
Date: Sat Mar 24 18:45:46 2012
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: mountall

Revision history for this message
Nikolaus Rath (nikratio) wrote :
Revision history for this message
Nikolaus Rath (nikratio) wrote :
Revision history for this message
Nikolaus Rath (nikratio) wrote :
Revision history for this message
Nikolaus Rath (nikratio) wrote :

I added an additional strace -f to mountall.conf as well, this is the output.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mountall (Ubuntu):
status: New → Confirmed
Revision history for this message
Nikolaus Rath (nikratio) wrote :

Since this is triggered by changing an upstart job unrelated to mountall, I'm reassigning this to upstart.

description: updated
affects: mountall (Ubuntu) → upstart (Ubuntu)
summary: - Hanging mountall stops boot process
+ Waiting for mounted MOUNTPOINT=/home in gdm.conf breaks system boot
summary: - Waiting for mounted MOUNTPOINT=/home in gdm.conf breaks system boot
+ Waiting for "mounted MOUNTPOINT=/home in gdm.conf" breaks system boot
summary: - Waiting for "mounted MOUNTPOINT=/home in gdm.conf" breaks system boot
+ Waiting for "mounted MOUNTPOINT=/home" in gdm.conf breaks system boot
Revision history for this message
Nikolaus Rath (nikratio) wrote : Re: Waiting for "mounted MOUNTPOINT=/home" in gdm.conf breaks system boot

Ok, here's my theory of what happens:

I noticed that if you have a very simple upstart job:

# cat /etc/init/test.conf
start on niko-a and niko-b
script
    sleep 10
end script

and then emit niko-a without --no-wait:

# initctl emit niko-a

then this call blocks until niko-b is emitted as well.

Now if mountall tries to emit the mounted event without --no-wait as well, this would block, because GDM is still waiting for other events before it can start (e.g. dbus or drm-device-added). However, these events cannot be emitted until mountall has finished, which it can't, because it's waiting for GDM to start.

I don't know if this is a bug in upstart (that should somehow accomodate such chains caused by ANDed starting conditions) or mountall, because it should not wait for the events to be processed.

(Tested on Ubuntu 10.04)

summary: - Waiting for "mounted MOUNTPOINT=/home" in gdm.conf breaks system boot
+ Dependency loops due to ANDed start conditions leave system unbootable
Changed in mountall (Ubuntu):
status: New → Confirmed
Revision history for this message
Nikolaus Rath (nikratio) wrote :

I think this bug can be considered triaged. It would be great if someone with the necessary permissions could update the status.

Revision history for this message
Steve Langasek (vorlon) wrote :

start on (filesystem
          and mounted MOUNTPOINT=/home
          and started dbus
          and (drm-device-added card0 PRIMARY_DEVICE_FOR_DISPLAY=1
               or stopped udevtrigger))

Um, don't do that.

mountall is working as designed. The error is in adding 'mounted MOUNTPOINT=/home', which is *already implied* by the 'filesystem' event.

Changed in mountall (Ubuntu):
status: Confirmed → Invalid
Changed in upstart (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Steve Langasek (vorlon) wrote :

(Yes, it's possible to construct upstart jobs that will hang the system, because some events, including some filesystem-related events, will block. But again - don't do that. The jobs shipped in Ubuntu do not have this problem, because they've been carefully constructed within the known constraints of the system. That you can't create arbitrary start conditions for your custom upstart jobs without risk of hanging the boot is not a bug in either upstart or mountall.)

Revision history for this message
Nikolaus Rath (nikratio) wrote :

Things aren't quite that simple. /home is an NFS mount transported over VPN, and mountall generally dosn't mount it (presumably because the VPN isn't up quickly enough). It nevertheless happily emits the "filesystem" event. We tried solving the problem by adding the explicit "mounted" event and starting a third mountall run. So is the real problem here that mountall emits "filesystem" even if it didn't mount all of them?

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 964207] Re: Dependency loops due to ANDed start conditions leave system unbootable

On Wed, Aug 08, 2012 at 12:44:21PM -0000, Nikolaus Rath wrote:
> Things aren't quite that simple. /home is an NFS mount transported over
> VPN, and mountall generally dosn't mount it (presumably because the VPN
> isn't up quickly enough).

Please show the /etc/fstab entry for this filesystem. The default behavior
is that mountall will wait around until all network filesystems can be
mounted, and it will retry mounting them each time it sees a network
connection come up. And the 'filesystem' event is not emitted until all
local and remote filesystems have been mounted - unless particular mount
options have been specified that would cause that filesystem to be skipped.

> It nevertheless happily emits the "filesystem" event. We tried solving
> the problem by adding the explicit "mounted" event and starting a third
> mountall run. So is the real problem here that mountall emits
> "filesystem" even if it didn't mount all of them?

Yes. It's possible this is a bug in mountall, but I suspect you may have a
buggy mount option configured in /etc/fstab.

Revision history for this message
Nikolaus Rath (nikratio) wrote :

fstab looks like this:

proc /proc proc nodev,noexec,nosuid 0 0
/dev/mapper/vg0-fat_client / ext4 relatime,errors=remount-ro 0 1
/dev/mapper/vg0-swap none swap sw 0 0
spitzer:/opt /opt nfs4 auto 0 0
spitzer:/home /home nfs4 auto 0 0

With this setup, /opt and /home are mounted by mountall, however, they are mounted with the wrong clientaddr:

$ mount | grep nfs
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
spitzer:/home on /home type nfs4 (rw,clientaddr=0.0.0.0,addr=192.168.1.2)
spitzer:/opt on /opt type nfs4 (rw,clientaddr=0.0.0.0,addr=192.168.1.2)

Since this happens on all clients (they all get same clientaddr), this results in a frozen mount (cf http://thread.gmane.org/gmane.linux.nfs/47780)

If I remount manually, the clientaddr is correct, so I believe mountall is attempting to mount this too early.

Should I report this as a separate bug?

Revision history for this message
Nikolaus Rath (nikratio) wrote :

Update: sometimes, /opt and /home are also not mounted at all with the above configuration.

Revision history for this message
Steve Langasek (vorlon) wrote :

On Tue, Aug 14, 2012 at 02:15:41PM -0000, Nikolaus Rath wrote:
> With this setup, /opt and /home are mounted by mountall, however, they
> are mounted with the wrong clientaddr:

> $ mount | grep nfs
> rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
> spitzer:/home on /home type nfs4 (rw,clientaddr=0.0.0.0,addr=192.168.1.2)
> spitzer:/opt on /opt type nfs4 (rw,clientaddr=0.0.0.0,addr=192.168.1.2)

Oh, interesting. I've never seen such behavior here, and I do use NFS
extensively.

> Since this happens on all clients (they all get same clientaddr), this
> results in a frozen mount (cf
> http://thread.gmane.org/gmane.linux.nfs/47780)

> If I remount manually, the clientaddr is correct, so I believe mountall
> is attempting to mount this too early.

> Should I report this as a separate bug?

Sounds like it should be.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.