mountall races with statd startup

Bug #613825 reported by Andrew Edmunds on 2010-08-05
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
mountall (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: mountall
mountall version: 2.15

On my Lucid system with latest updates, NFS filesystems sometimes fail to mount successfully at boot. After a boot where the NFS mounts fail, the mountall process is still running when I log in. Running "kill -USR1 $(pidof mountall)" will then mount the missing filesystems sucessfully. The symptoms are very similar to this bug back in Karmic:
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/445181

This problem still occurs after applying the patch proposed in
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/610863/comments/11
---
Architecture: i386
DistroRelease: Ubuntu 10.04
InstallationMedia: Mythbuntu 10.04 LTS "Lucid Lynx" - Release i386 (20100427.1)
NonfreeKernelModules: nvidia
Package: mountall 2.15
PackageArchitecture: i386
ProcEnviron:
 LANG=en_AU.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-23.37-generic-pae 2.6.32.15+drm33.5
Tags: lucid
Uname: Linux 2.6.32-23-generic-pae i686
UserGroups:

Andrew Edmunds (andrew-edmunds) wrote :

statd's start condition is:
start on (started portmap or mounting TYPE=nfs)

A "mounting TYPE=nfs" event while statd is stopped will cause statd to be started and mountall will block untill statd is running. However if statd is triggered by portmap and then a "mounting TYPE=nfs" event occurs while statd is in (say) pre-start or spawned state, the event is ignored and does not block mountall. Therefore it is not guaranteed that statd is in running state when mountall calls mount.nfs.

This seems to come down to the semantics of the "mounting" event. mounting(7) says:
"mountall(8) will wait for all services started by this event to be running, all tasks started by this event to have finished and all jobs stopped by this event to be stopped before proceeding with mounting the filesystem."

Waiting for "all services started by this event" is not good enough to guarantee that services the mount depends on will actually be running after the event completes. What is required is to wait for all services that *would be* started by this event to be running, even if they were actually started by something else.

If I've understood the code right, the implementation of this is in upstart-0.6.5/init/event.c, function event_pending_handle_jobs() :

                        nih_debug ("New instance %s", job_name (job));

                        /* Start the job with the environment we want */
                        if (job->goal != JOB_START) {
                                if (job->start_env)
                                        nih_unref (job->start_env, job);

                                job->start_env = env;
                                nih_ref (job->start_env, job);

                                job_finished (job, FALSE);

                                event_operator_events (job->class->start_on,
                                                       job, &job->blocking);

                                job_change_goal (job, JOB_START);
                        }

where event_operator_events() does the actual blocking and it only gets called in the case that the job's goal is changed to JOB_START.

Judging by their man pages, the "starting" and "stopping" events may have similar issues.

Andrew Edmunds (andrew-edmunds) wrote :
Andrew Edmunds (andrew-edmunds) wrote :

The above patch forces another retry of NFS mounts after statd is running. This is in no way a proper fix for the upstart issues described in #3 but it does (at last) allow my system to mount its filesystems reliably on boot.

tags: added: patch

apport information

tags: added: apport-collected
description: updated
description: updated
Andrew Edmunds (andrew-edmunds) wrote :
Download full text (5.0 KiB)

Apparently the upstart behaviour described here is as designed. See discussion below from the upstart-devel mailing list. That being the case, I think the fix I have already posted may be the best one available.

Scott James Remnant scott at netsplit.com
Wed Sep 1 13:31:33 BST 2010

    * Previous message: Behaviour of mounting event
    * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 2010-08-31 at 13:34 +0100, Scott James Remnant wrote:

> On Fri, 2010-08-27 at 23:30 +1000, Andrew Edmunds wrote:
>
> > Suppose I have a job B which includes the following stanza:
> >
> > start on (started A or mounting MOUNTPOINT=/m)
> >
Ah, sorry, for some reason I read "and" here - using "or" like this, the
following would be an expected behaviour.

> > It seems that the following sequence of events can occur:
> > 1. "started A" event is emitted
> > 2. Job B starts
> > 3. "mounting MOUNTPOINT=/m" event is emitted
> > 4. "mounting MOUNTPOINT=/m" event completes
> > 5. mountall attempts to mount /m
> > 6. Job B's main process is started
> > 7. Job B is marked as "running"

Scott
--
Have you ever, ever felt like this?
Had strange things happen? Are you going round the twist?

----------

Andrew Edmunds Andrew.Edmunds at yahoo.com.au
Wed Sep 1 12:21:31 BST 2010

Scott,

The job I'm interested in is Ubuntu's statd. The real start condition is:
start on (started portmap or mounting TYPE=nfs)

For an example where this seems to be misbehaving see boot.log and
syslog attached to:
https://bugs.launchpad.net/ubuntu/+source/mountall/+bug/613825

Here is an edited version with line numbers added to show what I mean.

[boot.log]
159 init: portmap state changed from post-start to running
167 init: statd goal changed from stop to start
179 init: statd state changed from starting to pre-start^M^M
180 init: statd pre-start process (686)
598 init: event_new: Pending mounting event
602 init: Handling mounting event^M^M
603 init: event_pending_handle_jobs: New instance statd^M^M
604 init: event_finished: Finished mounting event^M^M
605 mounting /multimedia^M
628 spawn: mount -t nfs -o _netdev diskbox.local:/share/multimedia
/multimedia
642 spawn: mount /multimedia [742]
706 mount.nfs: rpc.statd is not running but is required for remote locking.
707 mount.nfs: Either use '-o nolock' to keep locks local, or start statd.

[syslog]
906 Aug 5 20:33:55 tvbox rpc.statd[773]: Version 1.1.6 Starting
907 Aug 5 20:33:55 tvbox rpc.statd[773]: Flags:
995 Aug 5 20:33:55 tvbox init: statd state changed from post-start to
running

It seems pretty clear that the mounting events here do not block and
wait for statd to reach the running state. statd is finally running at
syslog line 995 but this is long after the last attempt by mountall to
mount the NFS filesystem.

----------

Scott James Remnant scott at netsplit.com
Tue Aug 31 13:34:09 BST 2010

On Fri, 2010-08-27 at 23:30 +1000, Andrew Edmunds wrote:

> Suppose I have a job B which includes the following stanza:
>
> start on (started A or mounting MOUNTPOINT=/m)
>
> It seems that the following sequence of events can occur:
> 1. "started A" event is emitted
> 2. Job B starts
> 3. "mounting MOUNTPOINT=/...

Read more...

Christian Reis (kiko) wrote :

Andrew, over in bug 525154 I've been looking at this and have come up with a solution which works reliably for me. Feel free to comment and tell me what you think of it!

Forest (foresto) wrote :

On Natty, nfs-common includes an upstart job that could help here, though it doesn't seem to be used:

/etc/init/statd-mounting.conf

When I change this line in mountall-net.conf:

start on net-device-up

To this:

start on net-device-up or stopped statd-mounting

mountall gets called again after statd has actually started, and my nfs shares get mounted at startup. I'm attaching a patch for mountall.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mountall (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers