Comment 9 for bug 1979089

Revision history for this message
Maximilian Stinsky (mstinsky) wrote :

Just wanted to note that we had this happen a second time in our production environment.
I took a deeper look into the logs of the exact time the exception started to occur and I think Edward seems right with his assumption as I can see that the keepalived process was promoted to master just before it got deleted.

What I could see is that the l3 agent was getting a router delete and around 44s later it starts and tries to respawn the metadata proxy, which fails because the namespace already got deleted.
After that all newly created routers fail to update there state and stay in an unknown state.
As before a restart of the l3 agent fixes the issue and the agents stops trying to start respawning the metadata-proxy of a deleted router.

Feb 28, 2023 @ 02:24:04.000
Tue Feb 28 01:24:04 2023: (VR_120) Entering MASTER STATE

Feb 28, 2023 @ 02:24:14.000:
Finished a router delete for 0828f373-2d43-4959-ba25-68ac9f178af2, update_id 12f83512-1442-45f6-a6f5-a3eee69fc6ae. Time elapsed: 4.929

Feb 28, 2023 @ 02:24:58.000
Respawning metadata-proxy for uuid 0828f373-2d43-4959-ba25-68ac9f178af2