Hi @bence-romsics, I believe the logs I pasted and the code referenced fairly clearly show where and why the problem is happening. If during the final stages of deleting a router, the vrrp master transitions, this causes l3-agent keepalived-state-change monitor to enqueue an action to update the api that leads to the sleep() at [1]. If while that sleep is happening the router delete proceeds to delete the namespace but before it gets to delete the metadata-proxy monitor the other thread wakes up, the metadata monitor realises that the haproxy is gone, tries to spawn a new one then fails because the namespace is gone and this keeps happening over and over because the monitor never gives up. When I get some time I will look closer but I believe the solution to this is to kill the keepalived-state-change monitor earlier in the router delete sequence so that it won't response to any vrrp transitions that occur while the router is being deleted. Hope that helps.
Hi @bence-romsics, I believe the logs I pasted and the code referenced fairly clearly show where and why the problem is happening. If during the final stages of deleting a router, the vrrp master transitions, this causes l3-agent keepalived- state-change monitor to enqueue an action to update the api that leads to the sleep() at [1]. If while that sleep is happening the router delete proceeds to delete the namespace but before it gets to delete the metadata-proxy monitor the other thread wakes up, the metadata monitor realises that the haproxy is gone, tries to spawn a new one then fails because the namespace is gone and this keeps happening over and over because the monitor never gives up. When I get some time I will look closer but I believe the solution to this is to kill the keepalived- state-change monitor earlier in the router delete sequence so that it won't response to any vrrp transitions that occur while the router is being deleted. Hope that helps.
[1] https:/ /github. com/openstack/ neutron/ blob/52bb040e4e 21b9db7e9787cec 8ac86de5644eadb /neutron/ agent/l3/ ha.py#L149