Comment 6 for bug 1979089

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Hi @bence-romsics, I believe the logs I pasted and the code referenced fairly clearly show where and why the problem is happening. If during the final stages of deleting a router, the vrrp master transitions, this causes l3-agent keepalived-state-change monitor to enqueue an action to update the api that leads to the sleep() at [1]. If while that sleep is happening the router delete proceeds to delete the namespace but before it gets to delete the metadata-proxy monitor the other thread wakes up, the metadata monitor realises that the haproxy is gone, tries to spawn a new one then fails because the namespace is gone and this keeps happening over and over because the monitor never gives up. When I get some time I will look closer but I believe the solution to this is to kill the keepalived-state-change monitor earlier in the router delete sequence so that it won't response to any vrrp transitions that occur while the router is being deleted. Hope that helps.

[1] https://github.com/openstack/neutron/blob/52bb040e4e21b9db7e9787cec8ac86de5644eadb/neutron/agent/l3/ha.py#L149