NFS mounts in /etc/fstab and cloud-init may cause boot hang
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init |
Expired
|
Medium
|
Unassigned |
Bug Description
Azure, RHEL 7.8, 7.9 and OEL 7.8, 7.9.
On OEL 7.8 cloud-init is cloud-init-
On both OEL and RHel 7.* (certainly 7.8 and 7.9), if we have a NFS mount in /etc/fstab (unknown if this applies to NFSv4), then boot may not complete. The end result is a hang, and the system is inaccessible from SSH or serial console login.
All points to a deadlock between the starting of the rpc.statd and rpc.statd-notify services and the cloud-init.service.
This happens because rpc.statd and rpc.statd-notify have the following dependencies declared:
# rcp-statd.service
[Unit]
Description=NFS status monitor for NFSv2/3 locking.
DefaultDependen
Conflicts=
Requires=
Wants=network-
After=network-
PartOf=
Wants=nfs-
After=nfs-
[Service]
Environment=
EnvironmentFile
Type=forking
PIDFile=
ExecStart=
# rpc-statd-
[Unit]
Description=Notify NFS peers of a restart
DefaultDependen
Wants=network-
After=local-
# Do not start up in HA environments
ConditionPathEx
# if we run an nfs server, it needs to be running before we
# tell clients that it has restarted.
After=nfs-
PartOf=
Wants=nfs-
After=nfs-
[Service]
EnvironmentFile
Type=forking
ExecStart=
while cloud-init.service is:
[Unit]
Description=Initial cloud-init job (metadata service crawler)
Wants=cloud-
Wants=sshd-
Wants=sshd.service
After=cloud-
After=NetworkMa
Before=
Before=
Before=sshd.service
Before=
ConditionPathEx
ConditionKernel
[Service]
Type=oneshot
ExecStart=
RemainAfterExit=yes
TimeoutSec=0
# Output needs to appear in instance console output
StandardOutput=
[Install]
WantedBy=
So cloud-init is to be started before network-
CX has demonstrated this to my satisfaction.
I see a few possible paths here:
1. CX has to change the (rpc-statd|
Before=
#Wants=
#After network-
2. CX has to change cloud-init.service so that it now states:
Wants=network-
After=network-
#Before=
3. CX removes the NFS mount from /etc/fstab, and adds it as a systemd .mount unit
CX opted for change #1 above, and now sees no boot issues.
There is a Red Hat bug about that: https:/
Changed in cloud-init: | |
status: | Incomplete → New |
Thanks for using cloud-init and taking the time to file a bug report!
> All points to a deadlock between the starting of the rpc.statd and rpc.statd-notify services and the cloud-init.service.
I don't see any evidence presented of a deadlock. The NFS units presumably should run after networking is available (that's what the N stands for, after all) and cloud-init.service is the first opportunity to run user's configuration, so having it run at a predictable point in boot before "most" things is desirable.
I don't doubt that you're hitting an issue, but we don't have enough information about it. Can you explain in a little more detail what the exact issue you're seeing is? If possible, please also include the output of `cloud-init collect-logs` from an affected instance, then move this back to New.
Thanks again!
Dan