Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
New
|
Undecided
|
Unassigned | ||
openvswitch (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Linux 4.4.0-96-generic on AMD64
Neutron 2:10.0.
OpenVSwitch 2.6.1-0ubuntu5.
In an environment with three bare-metal Neutron deployments, hosting upward of 300 routers, with approximately the same number of instances, typically one router per instance, packet loss on instances accessed via floating IPs, including complete connectivity loss, is experienced. The problem is exacerbated by enabling L3HA, likely due to the increase in router namespaces to be scheduled and managed, and the additional scheduling work of bringing up keepalived and monitoring the keepalived VIP.
Reducing the number of routers and rescheduling routers on new hosts, causing the routers to undergo a full recreation of namespace, iptables rules, and replugging of interfaces into OVS will correct packet loss or connectivity loss on impacted routers.
On Neutron hosts in this environment, we have used systemtap to trace calls to kfree_skb which reveals the majority of dropped packets occur in the openvswitch module, notably on the br-int bridge. Inspecting the state of OVS shows many qtap interfaces which are no longer present on the Neutron host which are still plugged in to OVS.
Diagnostic outputs in following comments.
Changed in openvswitch (Ubuntu): | |
status: | New → Incomplete |
I've subscribed Canonical Field Critical