mountall races with statd startup
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
mountall (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: mountall
mountall version: 2.15
On my Lucid system with latest updates, NFS filesystems sometimes fail to mount successfully at boot. After a boot where the NFS mounts fail, the mountall process is still running when I log in. Running "kill -USR1 $(pidof mountall)" will then mount the missing filesystems sucessfully. The symptoms are very similar to this bug back in Karmic:
https:/
This problem still occurs after applying the patch proposed in
https:/
---
Architecture: i386
DistroRelease: Ubuntu 10.04
InstallationMedia: Mythbuntu 10.04 LTS "Lucid Lynx" - Release i386 (20100427.1)
NonfreeKernelMo
Package: mountall 2.15
PackageArchitec
ProcEnviron:
LANG=en_AU.UTF-8
SHELL=/bin/bash
ProcVersionSign
Tags: lucid
Uname: Linux 2.6.32-
UserGroups:
Related branches
tags: | added: patch |
statd's start condition is:
start on (started portmap or mounting TYPE=nfs)
A "mounting TYPE=nfs" event while statd is stopped will cause statd to be started and mountall will block untill statd is running. However if statd is triggered by portmap and then a "mounting TYPE=nfs" event occurs while statd is in (say) pre-start or spawned state, the event is ignored and does not block mountall. Therefore it is not guaranteed that statd is in running state when mountall calls mount.nfs.
This seems to come down to the semantics of the "mounting" event. mounting(7) says:
"mountall(8) will wait for all services started by this event to be running, all tasks started by this event to have finished and all jobs stopped by this event to be stopped before proceeding with mounting the filesystem."
Waiting for "all services started by this event" is not good enough to guarantee that services the mount depends on will actually be running after the event completes. What is required is to wait for all services that *would be* started by this event to be running, even if they were actually started by something else.
If I've understood the code right, the implementation of this is in upstart- 0.6.5/init/ event.c, function event_pending_ handle_ jobs() :
where event_operator_ events( ) does the actual blocking and it only gets called in the case that the job's goal is changed to JOB_START.
Judging by their man pages, the "starting" and "stopping" events may have similar issues.