When dhcp agent is restarted and there are dhcp namespaces with interfaces in dadfailed state, NetworkCache in dhcp agent is not updated with subnets, which causes that subsequent creation of VM or update of port of VM in such network will delete the namespace completely which then causes connectivity outage to all VMs in such network.
I think we should fix that if exception is raised in dhcp agent in configure_dhcp_for_network in update_isolated_metadata_proxy() function, self.cache.put(network) should be called in each case to ensure that NetworkCache is updated correctly and dhcp namespace won't be delete in next SyncState call.
Here is the code from agent.py which I'm talking about
def configure_dhcp_for_network(self, network):
if not network.admin_state_up:
return
for subnet in network.subnets:
if subnet.enable_dhcp:
if self.call_driver('enable', network): self.update_isolated_metadata_proxy(network) self.cache.put(network) # After enabling dhcp for network, mark all existing # ports as ready. So that the status of ports which are # created before enabling dhcp can be updated. self.dhcp_ready_ports |= {p.id for p in network.ports} break
We experienced same bug last week as I described on mailing list http:// lists.openstack .org/pipermail/ openstack- discuss/ 2022-January/ 026484. html. This bug has severe consequences when dadfailed state is not noticied by operators.
When dhcp agent is restarted and there are dhcp namespaces with interfaces in dadfailed state, NetworkCache in dhcp agent is not updated with subnets, which causes that subsequent creation of VM or update of port of VM in such network will delete the namespace completely which then causes connectivity outage to all VMs in such network.
I think we should fix that if exception is raised in dhcp agent in configure_ dhcp_for_ network in update_ isolated_ metadata_ proxy() function, self.cache. put(network) should be called in each case to ensure that NetworkCache is updated correctly and dhcp namespace won't be delete in next SyncState call.
Here is the code from agent.py which I'm talking about
def configure_ dhcp_for_ network( self, network): admin_state_ up:
if not network.
return
for subnet in network.subnets: driver( 'enable' , network):
self. update_ isolated_ metadata_ proxy(network)
self. cache.put( network)
# After enabling dhcp for network, mark all existing
# ports as ready. So that the status of ports which are
# created before enabling dhcp can be updated.
self. dhcp_ready_ ports |= {p.id for p in network.ports}
break
if subnet.enable_dhcp:
if self.call_