[1.9] Static/Automatic IP addresses inside the dynamic range conflict with DHCP lease uploads
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Won't Fix
|
Undecided
|
Unassigned | ||
1.9 |
Won't Fix
|
Wishlist
|
Unassigned |
Bug Description
We've been seeing intermittent issues at a customer site where after a period of time (presumably the lease time for the auto-assigned IP address) if a node is rebooted it will come back up with a dynamic IP that doesn't match the one MaaS has auto-assigned. After digging for a while, it appears that MaaS is discovering the DHCP IP issued while commissioning the node and isn't properly deleting that from the database or clearing out the leases for it.
https:/
Is there something we can do to remove those IPs from the node configs? Is there specific information we can provide to help diagnose why this is happening?
tags: | added: canonical-bootstack |
Changed in maas: | |
milestone: | 2.1.1 → none |
First, I assume that the customer is using a separate static and dynamic range on their cluster interfaces?
Second, I suspect the discovered IP address in this case is just a symptom of the overall problem; it looks like MAAS /may/ be out of sync with the DHCP server in this case.
Since the node has an automatic IP address, when MAAS goes to deploy the node, MAAS (via curtin) will set up /etc/network/ interfaces to use the static IP address. (assuming it is not a custom image without this capability) So when this node is deployed, it should properly use the static IP address.
However, when the machine performs a network boot, commissions, or re-deploys, at those times it will not be able to configure its IP address statically, so what we also do is inform the DHCP server that we have leased out the IP address to that node. If /that/ communication is somehow interrupted, MAAS might get into a bad state, because now the state that MAAS believes is true will be out of sync with what DHCP believes is true.
I would check the DHCP lease database to see if the lease for the node is in the expected state. MAAS will normally use omshell to write the static lease to the DHCP server (using omapi). The fact that you see a discovered IP address in there you don't expect might indicate that the DHCP server has forgotten about the static lease.
Is there something about the DHCP server that could cause this "forgetfulness"? Maybe it was restored from a backup or snapshot (if it's a VM), or was otherwise offline for a little while? You might try releasing and re-assigning the IP address to see if that resolves the issue.
MAAS 2.x fixes this issue by design, in that we no longer rely on parsing the lease file; we write out a static configuration for the DHCP server. (Though the lingering "discovered" DHCP leases may still be seen at times.)