Comment 6 for bug 1835807

Revision history for this message
Kevin Smith (kevin.smith.wrs) wrote : Re: neutron-l3-agent and neutron-dhcp-agent never recovered after force reboot on compute

The VIM won't initiate any action to rebalance the routers/networks to a newly available host until all agents are alive, so the focus should be on why the neutron-l3-agent pod did not recover after the force reboot, not the VIM. Taking a second look however, I do see logs from a neutron-l3-agent pod on compute-3 after the reboot, so it appears the pod did come up (my mistake). The logs indicate that the l3 agent was unable to contact the neutron server so that is why it likely showed as dead (Only logs present are like those below, as timeout is incremented until 10 minutes, after which the logs appear every 10 minutes):

{"log":"2019-07-07 05:04:02.143 20 WARNING neutron_lib.rpc [req-fe11a876-7c24-4cc3-a33d-8d41545f1ed0 - - - - -] Increasing timeout for get_host_ha_router_count calls to 240 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 94c490c3248c4457af27c0cf5bc75537\n","stream":"stdout","time":"2019-07-07T05:04:02.14397681Z"}
{"log":"2019-07-07 05:04:03.708 20 WARNING neutron.agent.l3.agent [req-fe11a876-7c24-4cc3-a33d-8d41545f1ed0 - - - - -] l3-agent cannot contact neutron server to retrieve HA router count. Check connectivity to neutron server. Retrying... Detailed message: Timed out waiting for a reply to message ID 94c490c3248c4457af27c0cf5bc75537.: MessagingTimeout: Timed out waiting for a reply to message ID 94c490c3248c4457af27c0cf5bc75537\n","stream":"stdout","time":"2019-07-07T05:04:03.708695183Z"}