Network loop between physical networks with DVR
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Invalid
|
Undecided
|
Unassigned | ||
Queens |
Fix Released
|
Critical
|
Unassigned | ||
Rocky |
Fix Released
|
Critical
|
Unassigned | ||
neutron |
Fix Released
|
High
|
Rodolfo Alonso | ||
neutron (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Critical
|
Trent Lloyd |
Bug Description
(For SRU template, please see bug 1869808, as the SRU info there applies to this bug also)
Our CI experienced a network loop due to https:/
Steps
=====
# add more physical bridges
ovs-vsctl add-br br-physnet1
ip link set dev br-physnet1 up
ovs-vsctl add-br br-physnet2
ip link set dev br-physnet2 up
# set a broadcast going from one bridge
ip address add 1.1.1.1/31 dev br-physnet1
arping -b -I br-physnet1 1.1.1.1
# listen on the other
tcpdump -eni br-physnet2
# Update /etc/neutron/
[ml2_type_vlan]
network_vlan_ranges = public,
[ovs]
datapath_type = system
bridge_mappings = public:
tunnel_bridge = br-tun
local_ip = 127.0.0.1
[agent]
tunnel_types = vxlan
root_helper_daemon = sudo /usr/local/
root_helper = sudo /usr/local/
enable_
l2_population = True
# stop server and agent
systemctl stop devstack@q-svc
systemctl stop devstack@q-agt
# clear all flows
for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done
# start agent
systemctl start devstack@q-agt
$ sudo tcpdump -eni br-physnet2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes
09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
...
If there is more than one node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec.
I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed:
https:/
https:/
summary: |
- Network loop between physical network with DVR + Network loop between physical networks with DVR |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in neutron: | |
importance: | Undecided → High |
tags: | added: neutron-proactive-backport-potential |
tags: | removed: neutron-proactive-backport-potential |
Changed in neutron (Ubuntu): | |
status: | New → Fix Released |
Changed in neutron (Ubuntu Bionic): | |
status: | New → In Progress |
importance: | Undecided → Critical |
assignee: | nobody → Trent Lloyd (lathiat) |
description: | updated |
Changed in cloud-archive: | |
status: | New → Invalid |
This comment is no longer true /github. com/openstack/ neutron/ blob/5999716cfc 4a00ac426e016ea bbb51247ba0b190 /neutron/ plugins/ ml2/drivers/ openvswitch/ agent/ovs_ neutron_ agent.py# L1563-L1565
https:/
Because setup_rpc now runs after setup_physical_ bridges /opendev. org/openstack/ neutron/ commit/ d41bd58f31e259f e408c8c059b3129 9fdfe81127
https:/