I suspect that there is a race condition between those two requests (one from api user, another from l3 agent rpc) that results in a binding missing/not seen by the next api request that fetches l3 agents for the router.
It would be nice if someone from HA folks look at the logs.
What I see in the logs is:
http:// logs.openstack. org/94/ 487194/ 2/check/ gate-neutron- dsvm-fullstack- ubuntu- xenial/ e868687/ testr_results. html.gz
When router is created, it's first committed to db, then it's scheduled to agents, and finally the result of the create operation is returned.
If we look at request-id of the create operation, we see the following in neutron-server log:
2017-08-28 23:49:25.722 32562 DEBUG neutron. scheduler. l3_agent_ scheduler [req-e20d202e- 025a-4bb5- 9c64-7f4182037d 44 - - - - -] Router 12c8a31d- 6077-4096- 89cb-d64b98bb88 c1 is scheduled to L3 agent 2c2ed21e- e788-482d- 844d-d11eb61457 88 with binding_index 1 bind_router /opt/stack/ new/neutron/ neutron/ scheduler/ l3_agent_ scheduler. py:225
We don't see this request scheduling the router to another agent.
Though around same time we see another log message that talks about scheduling the router to that other agent:
2017-08-28 23:49:27.274 32565 DEBUG neutron. scheduler. l3_agent_ scheduler [req-eb2181a8- 2d02-4fec- 847d-e7e870d411 c3 - - - - -] Router 12c8a31d- 6077-4096- 89cb-d64b98bb88 c1 is scheduled to L3 agent d8edd5b8- ec1e-4cdf- 9e35-93f258e652 30 with binding_index 2 bind_router /opt/stack/ new/neutron/ neutron/ scheduler/ l3_agent_ scheduler. py:225
You can notice that the request-id is different. This request came from the affected l3 agent:
2017-08-28 23:49:25.073 677 DEBUG oslo_service. periodic_ task [req-eb2181a8- 2d02-4fec- 847d-e7e870d411 c3 - - - - -] Running periodic task L3NATAgentForTe st.periodic_ sync_routers_ task run_periodic_tasks /opt/stack/ new/neutron/ .tox/dsvm- fullstack/ local/lib/ python2. 7/site- packages/ oslo_service/ periodic_ task.py: 215 agent.l3. agent [req-eb2181a8- 2d02-4fec- 847d-e7e870d411 c3 - - - - -] Starting fullsync periodic_ sync_routers_ task periodic_ sync_routers_ task /opt/stack/ new/neutron/ neutron/ agent/l3/ agent.py: 569
2017-08-28 23:49:25.073 677 DEBUG neutron.
I suspect that there is a race condition between those two requests (one from api user, another from l3 agent rpc) that results in a binding missing/not seen by the next api request that fetches l3 agents for the router.
It would be nice if someone from HA folks look at the logs.