No routers created after l3-agent start - error during L3NATAgentWithStateReport.periodic_sync_routers_task
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Pepijn Oomen |
Bug Description
After (re)starting the L3 agent, a failure during L3NATAgentWithS
While the error persists, no routers are provisioned by the L3 agent. It just keeps failing and failing and failing in an seemingly infinite loop.
The fix/workaround was to remove a random router from the L3 agent having problems, like so:
$ neutron l3-agent-
It did not seem to matter exactly which of the many routers scheduled to run on the problematic L3 agent that was removed in this way. The removal itself seemed to get rid of the blockage, allowing the L3 agent to start normal operations shortly after.
Excerpts from l3-agent.log:
2016-11-15 03:14:26.201 2988 INFO neutron.
2016-11-15 03:14:26.201 2988 INFO neutron.
2016-11-15 03:14:26.504 2988 INFO eventlet.
2016-11-15 03:14:26.548 2988 INFO neutron.
2016-11-15 03:14:26.781 2988 INFO neutron.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
al_agent
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
2016-11-15 03:14:28.297 2988 ERROR oslo_service.
[ some time pass ]
2016-11-15 03:15:08.152 2988 ERROR oslo_service.
2016-11-15 03:15:08.152 2988 ERROR oslo_service.
2016-11-15 03:15:08.152 2988 ERROR oslo_service.
2016-11-15 03:15:08.152 2988 ERROR oslo_service.
[ ... ]
[ "neutron l3-agent-
2016-11-15 03:15:50.441 2988 WARNING neutron.
2016-11-15 03:17:23.102 2988 WARNING oslo.service.
15.31 sec
2016-11-15 03:17:23.174 2988 INFO oslo_rootwrap.
2016-11-15 03:17:23.634 2988 INFO neutron.agent.l3.ha [-] Router 19258eff-
[ ... ]
At this point, the L3 agent starts operating normally, creating all the routers it's scheduled to run and so on.
Changed in neutron: | |
assignee: | nobody → Pepijn Oomen (pjoomen) |
status: | Incomplete → Fix Committed |
Changed in neutron: | |
status: | Fix Committed → In Progress |
Changed in neutron: | |
importance: | Undecided → High |
milestone: | none → ocata-2 |
tags: | added: l3-bgp |
tags: |
added: l3-ipam-dhcp removed: l3-bgp |
tags: | added: neutron-proactive-backport-potential |
Hey Tore,
I have a question regarding this issue, before you restarted the L3 agent didn't you change the hostname or change the host value of the configuration file. I asked you this because it seems like this call is returning an empty list from the database.
2016-11-15 03:14:28.297 2988 ERROR oslo_service. periodic_ task return cctxt.call(context, 'get_router_ids', host=self.host)
By other hand, in order to reproduce this can you provide the release version and the configuration that you used it.