Description of problem:
Neutron Floating IPs stop working, instances become unreachable.
Version-Release number of selected component (if applicable):
Ocata.
Neutron-related RPMs:
puppet-neutron-10.3.2-0.20180103174737.2e7d298.el7.centos.noarch
openstack-neutron-common-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openvswitch-ovn-common-2.6.1-10.1.git20161206.el7.x86_64
openstack-neutron-sriov-nic-agent-10.0.5-0.20180105192920.295c700.el7.centos.noarch
python-neutron-lib-1.1.0-1.el7.noarch
openvswitch-2.6.1-10.1.git20161206.el7.x86_64
python2-neutronclient-6.1.1-1.el7.noarch
python-openvswitch-2.6.1-10.1.git20161206.el7.noarch
openstack-neutron-lbaas-10.0.2-0.20180104200311.10771af.el7.centos.noarch
openvswitch-ovn-host-2.6.1-10.1.git20161206.el7.x86_64
python-neutron-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openvswitch-ovn-central-2.6.1-10.1.git20161206.el7.x86_64
openstack-neutron-metering-agent-10.0.5-0.20180105192920.295c700.el7.centos.noarch
python-neutron-lbaas-10.0.2-0.20180104200311.10771af.el7.centos.noarch
openstack-neutron-openvswitch-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openstack-neutron-ml2-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openstack-neutron-10.0.5-0.20180105192920.295c700.el7.centos.noarch
How reproducible:
Not sure. We have noticed over the past day or so several users complaining about unreachable instances. Not all VMs have this issue and it is not clear how connectivity was lost in the first place.
Actual results:
In some cases, router is active on more than one controller, or router looks in the correct configuration but the qg-xxxx interface isn't NAT-ing the traffic to the qr-xxx interface. Iptables look correct.
Expected results:
VMs reachable via FIP.
Additional info:
Some ports appear to be stuck in 'BUILD' status, but not sure what is causing it.
See this error in the logs:
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info Traceback (most recent call last):
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 256, in call
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info return func(*args, **kwargs)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 1116, in process
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info self.process_external()
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 910, in process_external
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info self.update_fip_statuses(fip_statuses)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 926, in update_fip_statuses
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info self.agent.context, self.router_id, fip_statuses)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 125, in update_floatingip_statuses
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info router_id=router_id, fip_statuses=fip_statuses)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 151, in call
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info return self._original_context.call(ctxt, method, **kwargs)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 169, in call
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info retry=self.retry)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 97, in _send
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info timeout=timeout, retry=retry)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 566, in send
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info retry=retry)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 557, in _send
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info raise result
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info RemoteError: Remote error: TimeoutError QueuePool limit of size 10 overflow 20 reached, connection timed out, timeout 10
have you seen https:/ /bugs.launchpad .net/ubuntu/ +source/ neutron/ +bug/1384108 it seems to indicate setting the max connections to the number of API workers. EG: bumped the max connections to 4 x the worker configuration just to be on the safe side.
However, it is possible connections are being leaked, some place. Of the open connections maybe check to see if they are still valid.
https:/ /stackoverflow. com/questions/ 3360951/ sql-alchemy- connection- time-out applies to SQL, but backtrace seems to be AMQP