Comment 14 for bug 1749425

Revision history for this message
James Hebden (ec0) wrote :

@james-page, @axino -

Just a +1 to the HA property being changed requiring the router to be set down prior, and back up after to start the recreation of the router as HA.

We have seen various other side effects in Neutron/OVS environments and specifically the environment in question, such as -
* Missing interfaces inside qrouter namespaces (OVS taps)
* Missing iptables rules
* Missing floating IP aliases on OVS interfaces inside the qrouter namespaces
All of which are tasks which are performed during bringup of HA routers. We have seen fewer of these issues on non-HA routers, and whether the router is HA or not, rescheduling the router or converting from HA to non-HA or vice versa will rebuild and as a result repair the router.

I should also point out that at the time of these issues, we have rarely observed high system load, but I do also agree that the number of routers and therefore the workload on both Neutron and OVS to orchestrate interface plugging and unplugging and namespace (and associated network stack plumbing) work is much higher than a typical environment. Having three servers doing this work rather than scaling horizontally seems like it might be exposing bottlenecks in either Neutron or OVS when it comes to the orchestration of these tasks.

I'm not sure if you are seeing the following traceback in the logs provided, but the below traceback has also been common when this issue crops up, and shows an example of a task performed during the bringup of a router (the IPTablesManager initialisation) falling over.

2018-02-14 05:04:32.101 1352665 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:158
2018-02-14 05:04:32.103 1352665 DEBUG neutron.agent.linux.iptables_manager [-] IPTablesManager.apply completed with success. 0 iptables commands were issued _apply_synchronized /usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_manager.py:576
2018-02-14 05:04:32.103 1352665 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "iptables-qrouter-43801324-72ce-469f-a628-a5c645041e30" lock /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:228
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info [-] 'NoneType' object has no attribute 'remove_vip_by_ip_address'
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info Traceback (most recent call last):
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 253, in call
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info return func(*args, **kwargs)
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 1115, in process
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info self.process_external()
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 890, in process_external
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info self._process_external_gateway(ex_gw_port)
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 777, in _process_external_gateway
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info self.external_gateway_updated(ex_gw_port, interface_name)
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 403, in external_gateway_updated
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info self._remove_vip(old_gateway_cidr)
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 202, in _remove_vip
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info instance.remove_vip_by_ip_address(ip_cidr)
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info AttributeError: 'NoneType' object has no attribute 'remove_vip_by_ip_address'
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent [-] Failed to process compatible router: 43801324-72ce-469f-a628-a5c645041e30
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 517, in _process_router_update
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 454, in _process_router_if_compatible
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent self._process_updated_router(router)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 469, in _process_updated_router
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent ri.process()
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 426, in process
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent super(HaRouter, self).process()
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 256, in call
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent self.logger(e)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent self.force_reraise()
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 253, in call
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent return func(*args, **kwargs)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 1115, in process
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent self.process_external()
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 890, in process_external
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent self._process_external_gateway(ex_gw_port)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 777, in _process_external_gateway
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent self.external_gateway_updated(ex_gw_port, interface_name)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 403, in external_gateway_updated
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent self._remove_vip(old_gateway_cidr)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 202, in _remove_vip
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent instance.remove_vip_by_ip_address(ip_cidr)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent AttributeError: 'NoneType' object has no attribute 'remove_vip_by_ip_address'
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent