Comment 6 for bug 1953165

Revision history for this message
Kamil Madac (kamil-madac) wrote :

We experienced same bug last week as I described on mailing list http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026484.html. This bug has severe consequences when dadfailed state is not noticied by operators.

When dhcp agent is restarted and there are dhcp namespaces with interfaces in dadfailed state, NetworkCache in dhcp agent is not updated with subnets, which causes that subsequent creation of VM or update of port of VM in such network will delete the namespace completely which then causes connectivity outage to all VMs in such network.

I think we should fix that if exception is raised in dhcp agent in configure_dhcp_for_network in update_isolated_metadata_proxy() function, self.cache.put(network) should be called in each case to ensure that NetworkCache is updated correctly and dhcp namespace won't be delete in next SyncState call.

Here is the code from agent.py which I'm talking about

    def configure_dhcp_for_network(self, network):
        if not network.admin_state_up:
            return

        for subnet in network.subnets:
            if subnet.enable_dhcp:
                if self.call_driver('enable', network):
                    self.update_isolated_metadata_proxy(network)
                    self.cache.put(network)
                    # After enabling dhcp for network, mark all existing
                    # ports as ready. So that the status of ports which are
                    # created before enabling dhcp can be updated.
                    self.dhcp_ready_ports |= {p.id for p in network.ports}
                break