Ubuntu
nfs-utils package

statd job starts before service is listening

Bug #610863 reported by Andrew Edmunds on 2010-07-28

This bug affects 4 people

Affects		Status	Importance	Assigned to	Milestone
	nfs-utils (Ubuntu)	Confirmed	Undecided	Unassigned

Bug Description

Binary package hint: nfs-common
nfs-common version: 1:1.2.0-4ubuntu4

On my Lucid system with latest updates, NFS filesystems sometimes fail to mount successfully at boot.

Steps to reproduce:
1. In /etc/fstab, add an entry of the form:
<nfs-server>:<share> <mountpoint> nfs _netdev 0 0
2. Reboot a number of times

Expected results:
The NFS filesystem should be mounted on every boot

Actual results:
Occasionally the NFS mount fails and messages of the following form are logged in /var/log/boot.log:
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mountall: mount <mountpoint> [<pid>] terminated with status 32

After a boot where the NFS mounts fail, the mountall process is still running on login. Running:
kill -USR1 $(pidof mountall)
will then mount the missing filesystems sucessfully.

It is also possible to observe the problem without rebooting as follows.

Steps to reproduce:
Run the following shell loop, after setting $nfs_server, $share and $mntpt to suitable values

i=0
while (( i < 100 ))
do
    stop statd
    start statd
    mount.nfs $nfs_server:$share $mntpt
    umount $mntpt
    (( i++ ))
done

Expected results:
The NFS filesystem should be mounted on every iteration of the loop

Actual results:
Occasionally the NFS mount fails and the following messages are output:

mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified
umount: <mountpoint> not mounted

The symptoms are very similar to this bug back in Karmic:
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/445181

---
Architecture: i386
DistroRelease: Ubuntu 10.04
InstallationMedia: Mythbuntu 10.04 LTS "Lucid Lynx" - Release i386 (20100427.1)
NonfreeKernelModules: nvidia
Package: nfs-common 1:1.2.0-4ubuntu4
PackageArchitecture: i386
ProcEnviron:
SHELL=/bin/bash
LANG=en_AU.UTF-8
ProcVersionSignature: Ubuntu 2.6.32-23.37-generic-pae 2.6.32.15+drm33.5
Tags: lucid
Uname: Linux 2.6.32-23-generic-pae i686
UserGroups:

See original description

Tags:

Revision history for this message

Andrew Edmunds (andrew-edmunds) wrote on 2010-07-28:

/etc/fstab Edit (1.0 KiB, text/plain)

Revision history for this message

Andrew Edmunds (andrew-edmunds) wrote on 2010-07-28:

/var/log/boot.log with default log level - NFS mounts failed Edit (1018 bytes, text/plain)

Revision history for this message

Andrew Edmunds (andrew-edmunds) wrote on 2010-07-28:

/var/log/syslog boot messages, default log level, NFS mounts failed Edit (72.0 KiB, text/plain)

Revision history for this message

Andrew Edmunds (andrew-edmunds) wrote on 2010-07-28:

I did the following to get more verbose logging:

--- etc/default/grub.20100728 2010-07-28 18:15:41.536842847 +1000
+++ etc/default/grub 2010-07-28 21:57:45.144891535 +1000
@@ -6,7 +6,7 @@
GRUB_HIDDEN_TIMEOUT_QUIET=true
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
-GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
+GRUB_CMDLINE_LINUX_DEFAULT="--verbose"
GRUB_CMDLINE_LINUX=""

# Uncomment to disable graphical terminal (grub-pc only)

... and ...

--- etc/init/mountall.conf.20100728 2010-04-26 15:46:36.000000000 +1000
+++ etc/init/mountall.conf 2010-07-28 21:58:14.917391034 +1000
@@ -28,7 +28,7 @@
[ -f /forcefsck ] && force_fsck="--force-fsck"
[ "$FSCKFIX" = "yes" ] && fsck_fix="--fsck-fix"

- exec mountall --daemon $force_fsck $fsck_fix
+ exec mountall --daemon --verbose $force_fsck $fsck_fix
end script

post-stop script

Revision history for this message

Andrew Edmunds (andrew-edmunds) wrote on 2010-07-28:

/var/log/boot.log with verbose logging - NFS mounts failed Edit (34.3 KiB, text/plain)

Revision history for this message

Andrew Edmunds (andrew-edmunds) wrote on 2010-07-28:

/var/log/syslog boot messages, verbose logging - NFS mounts failed Edit (101.4 KiB, text/plain)

Revision history for this message

Andrew Edmunds (andrew-edmunds) wrote on 2010-07-29:

Theory A:
A "mounting TYPE=nfs" event while statd is stopped will cause statd to be started and mountall will block untill statd is running. However a "mounting TYPE=nfs" event while statd is in (say) pre-start or spawned state is ignored and does not block mountall. Therefore it is not guaranteed that statd is in running state when mountall calls mount.nfs.

Theory B:
statd is not really ready for business when upstart marks it as "runnning". It needs a post-start script that waits while it does initialization tasks such as registering with portmap.

I have not so far dug into the source sufficiently to prove or rule out either of these.

The log that I posted seems to support theory B but I have other examples that seem more consistent with theory A.

Revision history for this message

Andrew Edmunds (andrew-edmunds) wrote on 2010-07-29:

I scanned quickly through the source of relevant packages. The following lines caught my eye (upstart-0.6.5/init/job_process.c around line 1394):

        nih_info (_("%s %s process (%d) became new process (%d)"),
                  job_name (job), process_name (process),
                  job->pid[process], (pid_t)data);

        /* We no longer care about this process, it's the child that we're
         * interested in from now on, so detach it and allow it to go about
         * its business unhindered.
         */

The comment here is revealing. statd takes care not to let its parent process exit until the child is ready to service client requests. I think the sysvinit startup scripts depended on this. If upstart completely ignores the parent as soon as it has forked, this would introduce a race as per theory B above.

Revision history for this message

Andrew Edmunds (andrew-edmunds) wrote on 2010-07-31:

Proof that statd startup has a race condition Edit (5.8 KiB, text/plain)

Revision history for this message

Andrew Edmunds (andrew-edmunds) wrote on 2010-07-31:

#10

In the above output there are 5 examples where upstart has marked statd as running but the parent process still exists, which implies that statd is not ready to service client requests.

Andrew Edmunds (andrew-edmunds) on 2010-08-05

affects:	mountall (Ubuntu) → nfs-utils (Ubuntu)
summary:	- mountall races with statd startup + Race condition in statd.conf

Revision history for this message

Andrew Edmunds (andrew-edmunds) wrote on 2010-08-05: Re: Race condition in statd.conf

#11

Patch for nfs-common.statd.upstart Edit (423 bytes, text/plain)

Revision history for this message

Andrew Edmunds (andrew-edmunds) wrote on 2010-08-05:

#12

The above patch adds a post-start script that loops until the statd parent process has exited. This is the best workaround that I can think of. It doesn't seem possible to just wait() for the process with the current implementation of upstart's "expect fork" feature.

Suggestion for a better approach: The start script should be allowed to tell upstart the PID of the process that upstart should supervise. The information could be sent (for example) via a pipe, or the traditional /var/run/<job>.pid file. Then the knowledge of how to correctly start a particular daemon could be entirely in the start script where it belongs, rather than part of it being coded into upstart. This would allow the "expect fork" and "expect daemon" features to go away.

Brian Murray (brian-murray) on 2010-08-05

tags:

added: patch

Revision history for this message

Andrew Edmunds (andrew-edmunds) wrote on 2010-08-16: Dependencies.txt

#13

Dependencies.txt Edit (2.3 KiB, text/plain)

apport information

tags:	added: apport-collected
description:	updated

Andrew Edmunds (andrew-edmunds) on 2010-08-16

description:

updated

Revision history for this message

Andrew Edmunds (andrew-edmunds) wrote on 2010-08-18: Re: Race condition in statd.conf

#14

nfs-utils_1.2.2-1ubuntu2.debdiff Edit (1.2 KiB, text/plain)

Tested on Maverick alpha 3 and confirmed that this bug is still present. Debdiff for Maverick attached.

Revision history for this message

Andrew Edmunds (andrew-edmunds) wrote on 2010-08-27:

#15

I just noticed that the underlying upstart issue mentioned in #8 is already described by https://bugs.launchpad.net/upstart/+bug/530779. I won't mark this as a duplicate though - the workaround described here may be useful until the upstart bug is fixed.

Steve Langasek (vorlon) on 2010-12-23

summary:

- Race condition in statd.conf
+ statd job starts before service is listening

Revision history for this message

Bryce Harrington (bryce) wrote on 2011-08-08:

#16

Hi Andrew,

I notice there have been a number of changes since 1.2.0 which appear to improve behavior with upstart; would you mind confirming that the issue still exists with nfs-utils 1.2.4 (e.g. snag an oneiric livecd and reproduce there).

Revision history for this message

Steve Langasek (vorlon) wrote on 2011-08-10:

#17

If the issue is still present, we should properly address it by fixing the statd startup code, not by adding a post-start script.

tags:

removed: patch

Launchpad Janitor (janitor) on 2011-08-30

Changed in nfs-utils (Ubuntu):
status:	New → Confirmed

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntunfs-utils package

statd job starts before service is listening

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
nfs-utils package