idmapd does not starts to work after system reboot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
mountall (Ubuntu) |
Fix Released
|
High
|
Steve Langasek | ||
Lucid |
Won't Fix
|
High
|
Unassigned | ||
Precise |
Fix Released
|
High
|
Steve Langasek | ||
nfs-utils (Ubuntu) |
Fix Released
|
High
|
Steve Langasek | ||
Lucid |
Won't Fix
|
High
|
Unassigned | ||
Precise |
Won't Fix
|
High
|
Steve Langasek |
Bug Description
[Impact]
Certain systems which make extensive use of NFS mounts have not been able to boot reliably since the migration to mountall in Ubuntu 9.10. This is because of race conditions in the handling of mounts vs. the startup of NFS client daemons. A proper fix of the startup of the NFS client daemons would in turn cause a deadlock of mountall, which processes mounts serially. We should fix mountall to parallelize its handling of mounts, so that the NFS daemons can be fixed properly.
This issue also impacts the cloud-init package, which relies on being able to do 'start on mounted MOUNTPOINT=/' and the like.
[Test case]
1. Have Scott Moser confirm that the package in -proposed resolves the 120-second boot-time delay in cloud images when using cloud-init.
[Regression potential]
Difficult to quantify. This code was included in the 12.10 release, however a regression was found in that code (bug #1059471) which is now being fixed there in SRU. There may be other latent regressions that have not yet been identified. Furthermore, introducing additional asynchronous handling here has the potential to expose race conditions (bugs) in other code that currently works by accident due to the serial handling of mounts.
Binary package hint: nfs-common
I have server which runs Kerberos+
1. Edit /etc/rc.local and place there following commands
sleep 5
service autofs stop
service idmapd restart
service autofs start
2. Add rc.local to system startup:
update-rc.d rc.local enable
Hope this workaround will help to find out the source of the problem. It is obvious that something launches in wrong way. But what is that?
Thanks!
Changed in nfs-utils (Ubuntu): | |
status: | New → Triaged |
importance: | Undecided → High |
tags: | added: patch |
tags: | added: glucid lucid |
Changed in mountall (Ubuntu Lucid): | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in nfs-utils (Ubuntu Lucid): | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in mountall (Ubuntu): | |
assignee: | James Hunt (jamesodhunt) → Steve Langasek (vorlon) |
Changed in mountall (Ubuntu Precise): | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in nfs-utils (Ubuntu Precise): | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in nfs-utils (Ubuntu Lucid): | |
status: | Triaged → Won't Fix |
Changed in mountall (Ubuntu Lucid): | |
status: | Triaged → Won't Fix |
Changed in mountall (Ubuntu Precise): | |
assignee: | nobody → Steve Langasek (vorlon) |
Changed in nfs-utils (Ubuntu Precise): | |
assignee: | nobody → Steve Langasek (vorlon) |
description: | updated |
description: | updated |
tags: |
added: maverick removed: precise |
tags: | added: precise |
tags: |
added: lucid removed: glucid |
Building on Clint Byrum's work on bug #525154, I'm much closer now to understanding a possible solution for this issue, but it's going to require some coordination. Details:
- the current idmapd job starts on 'local-filesystems or mounting TYPE=nfs4' because it needs to start whenever an nfs4 filesystem is mounted and it also needs to wait until /usr and /var/lib are available before starting up (/usr because idmapd is located in /usr/sbin; /var/lib because it uses /var/lib/ nfs/rpc_ pipefs) . The only way to wait for /usr and /var/lib is by waiting for 'local- filesystems' ; it's *possible* that one or both of these filesystems is not local, but that's a local configuration error anyway.
- the start condition used here is buggy. If local-filesystems is emitted first, idmapd will proceed to start up without blocking any further 'mounting' hooks. If 'mounting TYPE=nfs4' is emitted first, there is no way to make the job wait for the local-filesystems signal to be received, which can cause the job to try to start before the filesystem is usable and wind up in an inconsistent state when idmapd aborts.
- using jobs in the style of portmap-wait and statd-mounting, it is possible to construct a set of jobs that will only start idmapd on local-filesystems, and *also* block any nfs4 mounts until idmapd is started.
- unfortunately, it appears that mountall itself blocks on the result of the 'mounting' hook before doing any further processing of *any* mount points, with the result that, if 'local-filesystems' has not already been emitted at the time it tries to mount the first nfs4 filesystem, we end up in a deadlock: the 'mounting' hook is waiting for idmapd to start; idmapd is waiting for local-filesystems to be emitted; and mountall is waiting for the 'mounting' hook to return before going on to do any other mounts.
I see three possible solutions here.
1. Change mountall to be able to do other work while waiting for the 'mounting' hook to return. Conceptually I don't see any reason this isn't possible, so it should just be a matter of code reordering. libnfsidmap/ ) to the root filesystem (/sbin, /lib) and move /var/lib/ nfs/rpc_ pipefs to /var/run/ nfs/rpc_ pipefs. The latter may be correct in its own right (I'm pretty sure there's nothing on this in-kernel mount point that would count as 'persistent state'); the former doesn't even cover all cases unless we also move the kerberos+ldap stack to /lib, due to /usr/lib/ libnfsidmap/ umich_ldap. so.
2. Change mountall to special case nfs4 mounts so that they are never handled until after local-filesystems is emitted. Yuck for the special-casing, though conceptually not actually different from what we're trying to achieve through the nfs-common upstart jobs.
3. Move idmapd and its dependencies (libevent; libnfsidmap, /usr/lib/
I believe option 1 is the most straightforward to SRU and is correct per se, although parts of 3 are probably worth pursuing in their own right as part of an overall effort to improve FHS compliance.