Ubuntu
upstart package

Dependency loops due to ANDed start conditions leave system unbootable

Bug #964207 reported by Nikolaus Rath on 2012-03-24

This bug affects 6 people

Affects		Status	Importance	Assigned to	Milestone
	mountall (Ubuntu)	Invalid	Undecided	Unassigned
	upstart (Ubuntu)	Invalid	Undecided	Unassigned

Bug Description

If /home is mounted on a separate partition, and the gdm start condition in /etc/init/gdm.conf is modified to include "mounted MOUNTPOINT=/home" as follows:

start on (filesystem
          and mounted MOUNTPOINT=/home
          and started dbus
          and (drm-device-added card0 PRIMARY_DEVICE_FOR_DISPLAY=1
               or stopped udevtrigger))

then a lucid system will no longer boot. It seems that in this case the mountall process is waiting for input from upstart, but upstart is not sending anything. Thus, the required muntall events are not emitted and the system refuses to boot.

The problem can be worked around by manually starting another mountall instance while the first instance is hanging.

I have attached the --verbose output of the first and second mountall, as
well as an strace output.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: mountall 2.15.3
ProcVersionSignature: Ubuntu 3.0.0-17.30~lucid1-server 3.0.22
Uname: Linux 3.0.0-17-server x86_64
Architecture: amd64
Date: Sat Mar 24 18:45:46 2012
ProcEnviron:
PATH=(custom, user)
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: mountall

See original description

Tags:

Revision history for this message

Nikolaus Rath (nikratio) wrote on 2012-03-24:

initctl list output for hanging system Edit (2.9 KiB, text/plain)
Dependencies.txt Edit (1.6 KiB, text/plain; charset="utf-8")

Revision history for this message

Nikolaus Rath (nikratio) wrote on 2012-03-24:

Output of upstart started mountall Edit (5.6 KiB, text/plain)

Revision history for this message

Nikolaus Rath (nikratio) wrote on 2012-03-24:

Output of second, manually started mountall Edit (3.9 KiB, text/plain)

Revision history for this message

Nikolaus Rath (nikratio) wrote on 2012-03-24:

strace -f output for upstarted started mountall Edit (898.0 KiB, text/plain)

I added an additional strace -f to mountall.conf as well, this is the output.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2012-03-25:

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mountall (Ubuntu):
status:	New → Confirmed

Revision history for this message

Nikolaus Rath (nikratio) wrote on 2012-03-31:

Since this is triggered by changing an upstart job unrelated to mountall, I'm reassigning this to upstart.

description:	updated
affects:	mountall (Ubuntu) → upstart (Ubuntu)
summary:	- Hanging mountall stops boot process + Waiting for mounted MOUNTPOINT=/home in gdm.conf breaks system boot
summary:	- Waiting for mounted MOUNTPOINT=/home in gdm.conf breaks system boot + Waiting for "mounted MOUNTPOINT=/home in gdm.conf" breaks system boot
summary:	- Waiting for "mounted MOUNTPOINT=/home in gdm.conf" breaks system boot + Waiting for "mounted MOUNTPOINT=/home" in gdm.conf breaks system boot

Revision history for this message

Nikolaus Rath (nikratio) wrote on 2012-08-07: Re: Waiting for "mounted MOUNTPOINT=/home" in gdm.conf breaks system boot

Ok, here's my theory of what happens:

I noticed that if you have a very simple upstart job:

# cat /etc/init/test.conf
start on niko-a and niko-b
script
sleep 10
end script

and then emit niko-a without --no-wait:

# initctl emit niko-a

then this call blocks until niko-b is emitted as well.

Now if mountall tries to emit the mounted event without --no-wait as well, this would block, because GDM is still waiting for other events before it can start (e.g. dbus or drm-device-added). However, these events cannot be emitted until mountall has finished, which it can't, because it's waiting for GDM to start.

I don't know if this is a bug in upstart (that should somehow accomodate such chains caused by ANDed starting conditions) or mountall, because it should not wait for the events to be processed.

(Tested on Ubuntu 10.04)

summary:	- Waiting for "mounted MOUNTPOINT=/home" in gdm.conf breaks system boot + Dependency loops due to ANDed start conditions leave system unbootable
Changed in mountall (Ubuntu):
status:	New → Confirmed

Revision history for this message

Nikolaus Rath (nikratio) wrote on 2012-08-07:

I think this bug can be considered triaged. It would be great if someone with the necessary permissions could update the status.

Revision history for this message

Steve Langasek (vorlon) wrote on 2012-08-07:

start on (filesystem
          and mounted MOUNTPOINT=/home
          and started dbus
          and (drm-device-added card0 PRIMARY_DEVICE_FOR_DISPLAY=1
               or stopped udevtrigger))

Um, don't do that.

mountall is working as designed. The error is in adding 'mounted MOUNTPOINT=/home', which is *already implied* by the 'filesystem' event.

Changed in mountall (Ubuntu):
status:	Confirmed → Invalid
Changed in upstart (Ubuntu):
status:	Confirmed → Invalid

Revision history for this message

Steve Langasek (vorlon) wrote on 2012-08-07:

#10

(Yes, it's possible to construct upstart jobs that will hang the system, because some events, including some filesystem-related events, will block. But again - don't do that. The jobs shipped in Ubuntu do not have this problem, because they've been carefully constructed within the known constraints of the system. That you can't create arbitrary start conditions for your custom upstart jobs without risk of hanging the boot is not a bug in either upstart or mountall.)

Revision history for this message

Nikolaus Rath (nikratio) wrote on 2012-08-08:

#11

Things aren't quite that simple. /home is an NFS mount transported over VPN, and mountall generally dosn't mount it (presumably because the VPN isn't up quickly enough). It nevertheless happily emits the "filesystem" event. We tried solving the problem by adding the explicit "mounted" event and starting a third mountall run. So is the real problem here that mountall emits "filesystem" even if it didn't mount all of them?

Revision history for this message

Steve Langasek (vorlon) wrote on 2012-08-08: Re: [Bug 964207] Re: Dependency loops due to ANDed start conditions leave system unbootable

#12

On Wed, Aug 08, 2012 at 12:44:21PM -0000, Nikolaus Rath wrote:
> Things aren't quite that simple. /home is an NFS mount transported over
> VPN, and mountall generally dosn't mount it (presumably because the VPN
> isn't up quickly enough).

Please show the /etc/fstab entry for this filesystem. The default behavior
is that mountall will wait around until all network filesystems can be
mounted, and it will retry mounting them each time it sees a network
connection come up. And the 'filesystem' event is not emitted until all
local and remote filesystems have been mounted - unless particular mount
options have been specified that would cause that filesystem to be skipped.

> It nevertheless happily emits the "filesystem" event. We tried solving
> the problem by adding the explicit "mounted" event and starting a third
> mountall run. So is the real problem here that mountall emits
> "filesystem" even if it didn't mount all of them?

Yes. It's possible this is a bug in mountall, but I suspect you may have a
buggy mount option configured in /etc/fstab.

Revision history for this message

Nikolaus Rath (nikratio) wrote on 2012-08-14:

#13

fstab looks like this:

proc /proc proc nodev,noexec,nosuid 0 0
/dev/mapper/vg0-fat_client / ext4 relatime,errors=remount-ro 0 1
/dev/mapper/vg0-swap none swap sw 0 0
spitzer:/opt /opt nfs4 auto 0 0
spitzer:/home /home nfs4 auto 0 0

With this setup, /opt and /home are mounted by mountall, however, they are mounted with the wrong clientaddr:

$ mount | grep nfs
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
spitzer:/home on /home type nfs4 (rw,clientaddr=0.0.0.0,addr=192.168.1.2)
spitzer:/opt on /opt type nfs4 (rw,clientaddr=0.0.0.0,addr=192.168.1.2)

Since this happens on all clients (they all get same clientaddr), this results in a frozen mount (cf http://thread.gmane.org/gmane.linux.nfs/47780)

If I remount manually, the clientaddr is correct, so I believe mountall is attempting to mount this too early.

Should I report this as a separate bug?

Revision history for this message

Nikolaus Rath (nikratio) wrote on 2012-08-14:

#14

Update: sometimes, /opt and /home are also not mounted at all with the above configuration.

Revision history for this message

Steve Langasek (vorlon) wrote on 2012-08-15:

#15

On Tue, Aug 14, 2012 at 02:15:41PM -0000, Nikolaus Rath wrote:
> With this setup, /opt and /home are mounted by mountall, however, they
> are mounted with the wrong clientaddr:

> $ mount | grep nfs
> rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
> spitzer:/home on /home type nfs4 (rw,clientaddr=0.0.0.0,addr=192.168.1.2)
> spitzer:/opt on /opt type nfs4 (rw,clientaddr=0.0.0.0,addr=192.168.1.2)

Oh, interesting. I've never seen such behavior here, and I do use NFS
extensively.

> Since this happens on all clients (they all get same clientaddr), this
> results in a frozen mount (cf
> http://thread.gmane.org/gmane.linux.nfs/47780)

> If I remount manually, the clientaddr is correct, so I believe mountall
> is attempting to mount this too early.

> Should I report this as a separate bug?

Sounds like it should be.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #642119

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntuupstart package

Dependency loops due to ANDed start conditions leave system unbootable

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
upstart package