Comment 3 for bug 613825

Revision history for this message
Andrew Edmunds (andrew-edmunds) wrote :

statd's start condition is:
start on (started portmap or mounting TYPE=nfs)

A "mounting TYPE=nfs" event while statd is stopped will cause statd to be started and mountall will block untill statd is running. However if statd is triggered by portmap and then a "mounting TYPE=nfs" event occurs while statd is in (say) pre-start or spawned state, the event is ignored and does not block mountall. Therefore it is not guaranteed that statd is in running state when mountall calls mount.nfs.

This seems to come down to the semantics of the "mounting" event. mounting(7) says:
"mountall(8) will wait for all services started by this event to be running, all tasks started by this event to have finished and all jobs stopped by this event to be stopped before proceeding with mounting the filesystem."

Waiting for "all services started by this event" is not good enough to guarantee that services the mount depends on will actually be running after the event completes. What is required is to wait for all services that *would be* started by this event to be running, even if they were actually started by something else.

If I've understood the code right, the implementation of this is in upstart-0.6.5/init/event.c, function event_pending_handle_jobs() :

                        nih_debug ("New instance %s", job_name (job));

                        /* Start the job with the environment we want */
                        if (job->goal != JOB_START) {
                                if (job->start_env)
                                        nih_unref (job->start_env, job);

                                job->start_env = env;
                                nih_ref (job->start_env, job);

                                job_finished (job, FALSE);

                                event_operator_events (job->class->start_on,
                                                       job, &job->blocking);

                                job_change_goal (job, JOB_START);
                        }

where event_operator_events() does the actual blocking and it only gets called in the case that the job's goal is changed to JOB_START.

Judging by their man pages, the "starting" and "stopping" events may have similar issues.