ha dhcp agents and port mismatch kills dhcp for a tenant network
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Expired
|
Undecided
|
Unassigned |
Bug Description
We have a setup with 3 control nodes, and each tenant network gets 2 dhcp agents. Most of the time, this works fine / well.
However, we have seen where the dhcp agents assigned to a network might point to control nodes 1 & 2 - but the port listing for the tenant network has dhcp ports which point to the dhcp agent being on control nodes 1& 3. We are not sure why / when this happens, but over time it seems to occur.
When this happens, the tenant vms won't even get dhcp request fulfilled.
Once their lease expires, they lose FIP networking and tend to get pretty
upset... Even though one of the ports is pointing to a valid agent, dhcp requests go out, but never get a reply.
We have found a workaround is to delete all the dhcp ports in the tenant network, then remove the agents - and allow neutron to recreate them both. Once this happens, dhcp works again.
Without a definite way to reproduce this, it will be difficult to work on. Could you provide any more data to help understand how serious this is? How often have you seen it? Have you talked to anyone else who has seen this?
Could you possibly add some instrumentation to the code to catch this problem when it happens? Maybe that could give us a better understanding.