cloud-init dhcp_discovery() crashes on preprovisioned RHEL 7.6 VM in Azure

Bug #1794399 reported by Jason Zions on 2018-09-26
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init
Undecided
Unassigned

Bug Description

Azure, creating a RHEL 7.6 VM from a pool of preprovisioned VM

In /usr/lib/python2.7/site-packages/cloudinit/net/dhcp.py, dhcp_discovery() starts dhclient specifically so it will capture the DHCP leases in dhcp.leases. The function copies the dhclient binary and starts it with options naming unique lease and pid files. The function then waits for both the lease and pid files to appear before using the contents of the pid file to kill the dhclient instance.

There’s a behavior difference between the Ubuntu and RHEL versions of dhclient:
• On Ubuntu, dhclient writes the DHCP lease response, forks/daemonizes, then writes the pid file with the daemonized process ID.
• On RHEL, dhclient writes a pid file with the pre-daemon pid, writes the DHCP lease response, forks/daemonizes, then overwrites the pid file with the new (daemonized) pid.

On RHEL, there’s a race between dhcp_discovery() and dhclient:
1. dhclient writes the pid file and lease file
2. dhclient forks; the parent process exits
3. dhcp_discovery() sees that the pid file and lease file exist
4. dhcp_discovery() tries to kill the process named in the pid file, but it already exited in step 2
5. dhclient child starts, daemonizes, and writes its pid in the pid file

When cloud-init runs on a preprovisioned RHEL 7.6 VM in Azure, dhcp.py dhcp_discovery() throws an error when it tries to send SIGKILL to a process that does not exist.

We have a patch that makes dhcp_discovery() wait until the pid in the pid file represents a daemon process (parent pid is 1) before killing the process. With this change, the issue is resolved.

This bug is fixed with commit fdadcb5f to cloud-init on branch master.
To view that commit see the following URL:
https://git.launchpad.net/cloud-init/commit/?id=fdadcb5f

Changed in cloud-init:
status: New → Fix Committed

This bug is believed to be fixed in cloud-init in version 19.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers