2020-07-10 10:41:19 |
Darragh O'Reilly |
bug |
|
|
added bug |
2020-07-10 10:41:39 |
Darragh O'Reilly |
summary |
Network loop between physical network with DVR |
Network loop between physical networks with DVR |
|
2020-07-10 10:43:53 |
Darragh O'Reilly |
description |
Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started.
Steps
=====
# add more physical bridges
ovs-vsctl add-br br-physnet1
ip link set dev br-physnet1 up
ovs-vsctl add-br br-physnet2
ip link set dev br-physnet2 up
# set a broadcast going from one bridge
ip address add 1.1.1.1/31 dev br-physnet1
arping -b -I br-physnet1 1.1.1.1
# listen on the other
tcpdump -eni br-physnet2
# Update /etc/neutron/plugins/ml2/ml2_conf.ini
[ml2_type_vlan]
network_vlan_ranges = public,physnet1,physnet2
[ovs]
datapath_type = system
bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2
tunnel_bridge = br-tun
local_ip = 127.0.0.1
[agent]
tunnel_types = vxlan
root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
enable_distributed_routing = True
l2_population = True
# stop server and agent
systemctl stop devstack@q-svc
systemctl stop devstack@q-agt
# clear all flows
for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done
systemctl start devstack@q-agt
$ sudo tcpdump -eni br-physnet2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes
09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
...
If there is more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec.
I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138
https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 |
Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started.
Steps
=====
# add more physical bridges
ovs-vsctl add-br br-physnet1
ip link set dev br-physnet1 up
ovs-vsctl add-br br-physnet2
ip link set dev br-physnet2 up
# set a broadcast going from one bridge
ip address add 1.1.1.1/31 dev br-physnet1
arping -b -I br-physnet1 1.1.1.1
# listen on the other
tcpdump -eni br-physnet2
# Update /etc/neutron/plugins/ml2/ml2_conf.ini
[ml2_type_vlan]
network_vlan_ranges = public,physnet1,physnet2
[ovs]
datapath_type = system
bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2
tunnel_bridge = br-tun
local_ip = 127.0.0.1
[agent]
tunnel_types = vxlan
root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
enable_distributed_routing = True
l2_population = True
# stop server and agent
systemctl stop devstack@q-svc
systemctl stop devstack@q-agt
# clear all flows
for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done
systemctl start devstack@q-agt
$ sudo tcpdump -eni br-physnet2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes
09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
...
If there are more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec.
I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138
https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 |
|
2020-07-10 10:47:47 |
Darragh O'Reilly |
description |
Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started.
Steps
=====
# add more physical bridges
ovs-vsctl add-br br-physnet1
ip link set dev br-physnet1 up
ovs-vsctl add-br br-physnet2
ip link set dev br-physnet2 up
# set a broadcast going from one bridge
ip address add 1.1.1.1/31 dev br-physnet1
arping -b -I br-physnet1 1.1.1.1
# listen on the other
tcpdump -eni br-physnet2
# Update /etc/neutron/plugins/ml2/ml2_conf.ini
[ml2_type_vlan]
network_vlan_ranges = public,physnet1,physnet2
[ovs]
datapath_type = system
bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2
tunnel_bridge = br-tun
local_ip = 127.0.0.1
[agent]
tunnel_types = vxlan
root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
enable_distributed_routing = True
l2_population = True
# stop server and agent
systemctl stop devstack@q-svc
systemctl stop devstack@q-agt
# clear all flows
for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done
systemctl start devstack@q-agt
$ sudo tcpdump -eni br-physnet2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes
09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
...
If there are more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec.
I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138
https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 |
Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started.
Steps
=====
# add more physical bridges
ovs-vsctl add-br br-physnet1
ip link set dev br-physnet1 up
ovs-vsctl add-br br-physnet2
ip link set dev br-physnet2 up
# set a broadcast going from one bridge
ip address add 1.1.1.1/31 dev br-physnet1
arping -b -I br-physnet1 1.1.1.1
# listen on the other
tcpdump -eni br-physnet2
# Update /etc/neutron/plugins/ml2/ml2_conf.ini
[ml2_type_vlan]
network_vlan_ranges = public,physnet1,physnet2
[ovs]
datapath_type = system
bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2
tunnel_bridge = br-tun
local_ip = 127.0.0.1
[agent]
tunnel_types = vxlan
root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
enable_distributed_routing = True
l2_population = True
# stop server and agent
systemctl stop devstack@q-svc
systemctl stop devstack@q-agt
# clear all flows
for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done
# start agent
systemctl start devstack@q-agt
$ sudo tcpdump -eni br-physnet2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes
09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
...
If there is more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec.
I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138
https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 |
|
2020-07-10 10:48:52 |
Darragh O'Reilly |
description |
Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started.
Steps
=====
# add more physical bridges
ovs-vsctl add-br br-physnet1
ip link set dev br-physnet1 up
ovs-vsctl add-br br-physnet2
ip link set dev br-physnet2 up
# set a broadcast going from one bridge
ip address add 1.1.1.1/31 dev br-physnet1
arping -b -I br-physnet1 1.1.1.1
# listen on the other
tcpdump -eni br-physnet2
# Update /etc/neutron/plugins/ml2/ml2_conf.ini
[ml2_type_vlan]
network_vlan_ranges = public,physnet1,physnet2
[ovs]
datapath_type = system
bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2
tunnel_bridge = br-tun
local_ip = 127.0.0.1
[agent]
tunnel_types = vxlan
root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
enable_distributed_routing = True
l2_population = True
# stop server and agent
systemctl stop devstack@q-svc
systemctl stop devstack@q-agt
# clear all flows
for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done
# start agent
systemctl start devstack@q-agt
$ sudo tcpdump -eni br-physnet2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes
09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
...
If there is more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec.
I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138
https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 |
Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started.
Steps
=====
# add more physical bridges
ovs-vsctl add-br br-physnet1
ip link set dev br-physnet1 up
ovs-vsctl add-br br-physnet2
ip link set dev br-physnet2 up
# set a broadcast going from one bridge
ip address add 1.1.1.1/31 dev br-physnet1
arping -b -I br-physnet1 1.1.1.1
# listen on the other
tcpdump -eni br-physnet2
# Update /etc/neutron/plugins/ml2/ml2_conf.ini
[ml2_type_vlan]
network_vlan_ranges = public,physnet1,physnet2
[ovs]
datapath_type = system
bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2
tunnel_bridge = br-tun
local_ip = 127.0.0.1
[agent]
tunnel_types = vxlan
root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
enable_distributed_routing = True
l2_population = True
# stop server and agent
systemctl stop devstack@q-svc
systemctl stop devstack@q-agt
# clear all flows
for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done
# start agent
systemctl start devstack@q-agt
$ sudo tcpdump -eni br-physnet2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes
09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
...
If there are more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec.
I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138
https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 |
|
2020-07-10 10:50:00 |
Darragh O'Reilly |
description |
Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started.
Steps
=====
# add more physical bridges
ovs-vsctl add-br br-physnet1
ip link set dev br-physnet1 up
ovs-vsctl add-br br-physnet2
ip link set dev br-physnet2 up
# set a broadcast going from one bridge
ip address add 1.1.1.1/31 dev br-physnet1
arping -b -I br-physnet1 1.1.1.1
# listen on the other
tcpdump -eni br-physnet2
# Update /etc/neutron/plugins/ml2/ml2_conf.ini
[ml2_type_vlan]
network_vlan_ranges = public,physnet1,physnet2
[ovs]
datapath_type = system
bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2
tunnel_bridge = br-tun
local_ip = 127.0.0.1
[agent]
tunnel_types = vxlan
root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
enable_distributed_routing = True
l2_population = True
# stop server and agent
systemctl stop devstack@q-svc
systemctl stop devstack@q-agt
# clear all flows
for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done
# start agent
systemctl start devstack@q-agt
$ sudo tcpdump -eni br-physnet2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes
09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
...
If there are more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec.
I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138
https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 |
Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started.
Steps
=====
# add more physical bridges
ovs-vsctl add-br br-physnet1
ip link set dev br-physnet1 up
ovs-vsctl add-br br-physnet2
ip link set dev br-physnet2 up
# set a broadcast going from one bridge
ip address add 1.1.1.1/31 dev br-physnet1
arping -b -I br-physnet1 1.1.1.1
# listen on the other
tcpdump -eni br-physnet2
# Update /etc/neutron/plugins/ml2/ml2_conf.ini
[ml2_type_vlan]
network_vlan_ranges = public,physnet1,physnet2
[ovs]
datapath_type = system
bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2
tunnel_bridge = br-tun
local_ip = 127.0.0.1
[agent]
tunnel_types = vxlan
root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
enable_distributed_routing = True
l2_population = True
# stop server and agent
systemctl stop devstack@q-svc
systemctl stop devstack@q-agt
# clear all flows
for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done
# start agent
systemctl start devstack@q-agt
$ sudo tcpdump -eni br-physnet2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes
09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
...
If there is more than one node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec.
I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138
https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 |
|
2020-07-10 11:07:03 |
Flávio Ramalho |
bug |
|
|
added subscriber Flávio Ramalho |
2020-07-10 16:33:45 |
OpenStack Infra |
neutron: status |
New |
In Progress |
|
2020-07-10 16:33:45 |
OpenStack Infra |
neutron: assignee |
|
Darragh O'Reilly (darragh-oreilly) |
|
2020-07-13 14:35:57 |
Bogdan Dobrelya |
bug |
|
|
added subscriber Slawek Kaplonski |
2020-07-13 16:47:03 |
Brian Haley |
neutron: importance |
Undecided |
High |
|
2020-07-16 13:02:41 |
OpenStack Infra |
neutron: assignee |
Darragh O'Reilly (darragh-oreilly) |
Rodolfo Alonso (rodolfo-alonso-hernandez) |
|
2020-07-22 04:12:29 |
OpenStack Infra |
neutron: status |
In Progress |
Fix Released |
|
2020-07-23 00:43:08 |
OpenStack Infra |
tags |
|
in-stable-rocky |
|
2020-07-23 00:43:18 |
OpenStack Infra |
tags |
in-stable-rocky |
in-stable-pike in-stable-rocky |
|
2020-07-23 00:56:54 |
OpenStack Infra |
tags |
in-stable-pike in-stable-rocky |
in-stable-pike in-stable-queens in-stable-rocky |
|
2020-07-23 00:57:06 |
OpenStack Infra |
tags |
in-stable-pike in-stable-queens in-stable-rocky |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein |
|
2020-07-23 01:15:41 |
OpenStack Infra |
tags |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-ussuri |
|
2020-07-24 18:02:35 |
OpenStack Infra |
tags |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-ussuri |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri |
|
2020-07-31 14:07:13 |
Bernard Cafarelli |
tags |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri neutron-proactive-backport-potential |
|
2020-10-09 09:42:13 |
Slawek Kaplonski |
tags |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri neutron-proactive-backport-potential |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri |
|
2021-02-18 17:21:53 |
Dan Streetman |
bug task added |
|
neutron (Ubuntu) |
|
2021-02-18 17:22:02 |
Dan Streetman |
nominated for series |
|
Ubuntu Bionic |
|
2021-02-18 17:22:02 |
Dan Streetman |
bug task added |
|
neutron (Ubuntu Bionic) |
|
2021-02-18 17:22:10 |
Dan Streetman |
neutron (Ubuntu): status |
New |
Fix Released |
|
2021-02-18 17:22:15 |
Dan Streetman |
neutron (Ubuntu Bionic): status |
New |
In Progress |
|
2021-02-18 17:22:18 |
Dan Streetman |
neutron (Ubuntu Bionic): importance |
Undecided |
Critical |
|
2021-02-18 17:22:22 |
Dan Streetman |
neutron (Ubuntu Bionic): assignee |
|
Trent Lloyd (lathiat) |
|
2021-02-18 17:22:39 |
Dan Streetman |
description |
Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started.
Steps
=====
# add more physical bridges
ovs-vsctl add-br br-physnet1
ip link set dev br-physnet1 up
ovs-vsctl add-br br-physnet2
ip link set dev br-physnet2 up
# set a broadcast going from one bridge
ip address add 1.1.1.1/31 dev br-physnet1
arping -b -I br-physnet1 1.1.1.1
# listen on the other
tcpdump -eni br-physnet2
# Update /etc/neutron/plugins/ml2/ml2_conf.ini
[ml2_type_vlan]
network_vlan_ranges = public,physnet1,physnet2
[ovs]
datapath_type = system
bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2
tunnel_bridge = br-tun
local_ip = 127.0.0.1
[agent]
tunnel_types = vxlan
root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
enable_distributed_routing = True
l2_population = True
# stop server and agent
systemctl stop devstack@q-svc
systemctl stop devstack@q-agt
# clear all flows
for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done
# start agent
systemctl start devstack@q-agt
$ sudo tcpdump -eni br-physnet2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes
09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
...
If there is more than one node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec.
I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138
https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 |
(For SRU template, please see bug 1869808, as the SRU info there applies to this bug also)
Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started.
Steps
=====
# add more physical bridges
ovs-vsctl add-br br-physnet1
ip link set dev br-physnet1 up
ovs-vsctl add-br br-physnet2
ip link set dev br-physnet2 up
# set a broadcast going from one bridge
ip address add 1.1.1.1/31 dev br-physnet1
arping -b -I br-physnet1 1.1.1.1
# listen on the other
tcpdump -eni br-physnet2
# Update /etc/neutron/plugins/ml2/ml2_conf.ini
[ml2_type_vlan]
network_vlan_ranges = public,physnet1,physnet2
[ovs]
datapath_type = system
bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2
tunnel_bridge = br-tun
local_ip = 127.0.0.1
[agent]
tunnel_types = vxlan
root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
enable_distributed_routing = True
l2_population = True
# stop server and agent
systemctl stop devstack@q-svc
systemctl stop devstack@q-agt
# clear all flows
for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done
# start agent
systemctl start devstack@q-agt
$ sudo tcpdump -eni br-physnet2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes
09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
...
If there is more than one node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec.
I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138
https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 |
|
2021-02-18 18:36:21 |
Corey Bryant |
bug task added |
|
cloud-archive |
|
2021-02-18 18:36:33 |
Corey Bryant |
nominated for series |
|
cloud-archive/queens |
|
2021-02-18 18:36:33 |
Corey Bryant |
bug task added |
|
cloud-archive/queens |
|
2021-02-18 18:36:33 |
Corey Bryant |
nominated for series |
|
cloud-archive/rocky |
|
2021-02-18 18:36:33 |
Corey Bryant |
bug task added |
|
cloud-archive/rocky |
|
2021-02-18 18:37:09 |
Corey Bryant |
cloud-archive/queens: status |
New |
Triaged |
|
2021-02-18 18:37:15 |
Corey Bryant |
cloud-archive/queens: importance |
Undecided |
Critical |
|
2021-02-18 18:37:21 |
Corey Bryant |
cloud-archive/rocky: importance |
Undecided |
Critical |
|
2021-02-18 18:37:26 |
Corey Bryant |
cloud-archive/rocky: status |
New |
Triaged |
|
2021-02-18 18:37:35 |
Corey Bryant |
cloud-archive: status |
New |
Invalid |
|
2021-02-19 16:04:10 |
Corey Bryant |
cloud-archive/rocky: status |
Triaged |
Fix Committed |
|
2021-02-19 16:04:13 |
Corey Bryant |
tags |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri verification-rocky-needed |
|
2021-03-18 15:36:09 |
Łukasz Zemczak |
neutron (Ubuntu Bionic): status |
In Progress |
Fix Committed |
|
2021-03-18 15:36:13 |
Łukasz Zemczak |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2021-03-18 15:36:15 |
Łukasz Zemczak |
bug |
|
|
added subscriber SRU Verification |
2021-03-18 15:36:20 |
Łukasz Zemczak |
tags |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri verification-rocky-needed |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri verification-needed verification-needed-bionic verification-rocky-needed |
|
2021-03-22 21:01:54 |
Corey Bryant |
cloud-archive/queens: status |
Triaged |
Fix Committed |
|
2021-03-22 21:01:57 |
Corey Bryant |
tags |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri verification-needed verification-needed-bionic verification-rocky-needed |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri verification-needed verification-needed-bionic verification-queens-needed verification-rocky-needed |
|
2021-04-07 10:18:32 |
Edward Hope-Morley |
tags |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri verification-needed verification-needed-bionic verification-queens-needed verification-rocky-needed |
in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri verification-done verification-done-bionic verification-queens-done verification-rocky-done |
|
2021-04-08 09:00:50 |
Łukasz Zemczak |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|
2021-04-08 09:05:26 |
Launchpad Janitor |
neutron (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2021-04-12 13:53:08 |
Corey Bryant |
cloud-archive/queens: status |
Fix Committed |
Fix Released |
|
2021-04-12 18:13:17 |
Corey Bryant |
cloud-archive/rocky: status |
Fix Committed |
Fix Released |
|
2021-10-12 12:01:14 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~hopem/ubuntu/+source/neutron/+git/neutron/+merge/410051 |
|