Activity log for bug #1887148

Date Who What changed Old value New value Message
2020-07-10 10:41:19 Darragh O'Reilly bug added bug
2020-07-10 10:41:39 Darragh O'Reilly summary Network loop between physical network with DVR Network loop between physical networks with DVR
2020-07-10 10:43:53 Darragh O'Reilly description Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started. Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done systemctl start devstack@q-agt $ sudo tcpdump -eni br-physnet2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... If there is more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started. Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done systemctl start devstack@q-agt $ sudo tcpdump -eni br-physnet2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... If there are more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234
2020-07-10 10:47:47 Darragh O'Reilly description Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started. Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done systemctl start devstack@q-agt $ sudo tcpdump -eni br-physnet2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... If there are more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started. Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done # start agent systemctl start devstack@q-agt $ sudo tcpdump -eni br-physnet2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... If there is more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234
2020-07-10 10:48:52 Darragh O'Reilly description Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started. Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done # start agent systemctl start devstack@q-agt $ sudo tcpdump -eni br-physnet2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... If there is more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started. Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done # start agent systemctl start devstack@q-agt $ sudo tcpdump -eni br-physnet2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... If there are more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234
2020-07-10 10:50:00 Darragh O'Reilly description Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started. Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done # start agent systemctl start devstack@q-agt $ sudo tcpdump -eni br-physnet2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... If there are more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started. Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done # start agent systemctl start devstack@q-agt $ sudo tcpdump -eni br-physnet2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... If there is more than one node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234
2020-07-10 11:07:03 Flávio Ramalho bug added subscriber Flávio Ramalho
2020-07-10 16:33:45 OpenStack Infra neutron: status New In Progress
2020-07-10 16:33:45 OpenStack Infra neutron: assignee Darragh O'Reilly (darragh-oreilly)
2020-07-13 14:35:57 Bogdan Dobrelya bug added subscriber Slawek Kaplonski
2020-07-13 16:47:03 Brian Haley neutron: importance Undecided High
2020-07-16 13:02:41 OpenStack Infra neutron: assignee Darragh O'Reilly (darragh-oreilly) Rodolfo Alonso (rodolfo-alonso-hernandez)
2020-07-22 04:12:29 OpenStack Infra neutron: status In Progress Fix Released
2020-07-23 00:43:08 OpenStack Infra tags in-stable-rocky
2020-07-23 00:43:18 OpenStack Infra tags in-stable-rocky in-stable-pike in-stable-rocky
2020-07-23 00:56:54 OpenStack Infra tags in-stable-pike in-stable-rocky in-stable-pike in-stable-queens in-stable-rocky
2020-07-23 00:57:06 OpenStack Infra tags in-stable-pike in-stable-queens in-stable-rocky in-stable-pike in-stable-queens in-stable-rocky in-stable-stein
2020-07-23 01:15:41 OpenStack Infra tags in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-ussuri
2020-07-24 18:02:35 OpenStack Infra tags in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-ussuri in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri
2020-07-31 14:07:13 Bernard Cafarelli tags in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri neutron-proactive-backport-potential
2020-10-09 09:42:13 Slawek Kaplonski tags in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri neutron-proactive-backport-potential in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri
2021-02-18 17:21:53 Dan Streetman bug task added neutron (Ubuntu)
2021-02-18 17:22:02 Dan Streetman nominated for series Ubuntu Bionic
2021-02-18 17:22:02 Dan Streetman bug task added neutron (Ubuntu Bionic)
2021-02-18 17:22:10 Dan Streetman neutron (Ubuntu): status New Fix Released
2021-02-18 17:22:15 Dan Streetman neutron (Ubuntu Bionic): status New In Progress
2021-02-18 17:22:18 Dan Streetman neutron (Ubuntu Bionic): importance Undecided Critical
2021-02-18 17:22:22 Dan Streetman neutron (Ubuntu Bionic): assignee Trent Lloyd (lathiat)
2021-02-18 17:22:39 Dan Streetman description Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started. Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done # start agent systemctl start devstack@q-agt $ sudo tcpdump -eni br-physnet2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... If there is more than one node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 (For SRU template, please see bug 1869808, as the SRU info there applies to this bug also) Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started. Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done # start agent systemctl start devstack@q-agt $ sudo tcpdump -eni br-physnet2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... If there is more than one node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234
2021-02-18 18:36:21 Corey Bryant bug task added cloud-archive
2021-02-18 18:36:33 Corey Bryant nominated for series cloud-archive/queens
2021-02-18 18:36:33 Corey Bryant bug task added cloud-archive/queens
2021-02-18 18:36:33 Corey Bryant nominated for series cloud-archive/rocky
2021-02-18 18:36:33 Corey Bryant bug task added cloud-archive/rocky
2021-02-18 18:37:09 Corey Bryant cloud-archive/queens: status New Triaged
2021-02-18 18:37:15 Corey Bryant cloud-archive/queens: importance Undecided Critical
2021-02-18 18:37:21 Corey Bryant cloud-archive/rocky: importance Undecided Critical
2021-02-18 18:37:26 Corey Bryant cloud-archive/rocky: status New Triaged
2021-02-18 18:37:35 Corey Bryant cloud-archive: status New Invalid
2021-02-19 16:04:10 Corey Bryant cloud-archive/rocky: status Triaged Fix Committed
2021-02-19 16:04:13 Corey Bryant tags in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri verification-rocky-needed
2021-03-18 15:36:09 Łukasz Zemczak neutron (Ubuntu Bionic): status In Progress Fix Committed
2021-03-18 15:36:13 Łukasz Zemczak bug added subscriber Ubuntu Stable Release Updates Team
2021-03-18 15:36:15 Łukasz Zemczak bug added subscriber SRU Verification
2021-03-18 15:36:20 Łukasz Zemczak tags in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri verification-rocky-needed in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri verification-needed verification-needed-bionic verification-rocky-needed
2021-03-22 21:01:54 Corey Bryant cloud-archive/queens: status Triaged Fix Committed
2021-03-22 21:01:57 Corey Bryant tags in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri verification-needed verification-needed-bionic verification-rocky-needed in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri verification-needed verification-needed-bionic verification-queens-needed verification-rocky-needed
2021-04-07 10:18:32 Edward Hope-Morley tags in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri verification-needed verification-needed-bionic verification-queens-needed verification-rocky-needed in-stable-pike in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri verification-done verification-done-bionic verification-queens-done verification-rocky-done
2021-04-08 09:00:50 Łukasz Zemczak removed subscriber Ubuntu Stable Release Updates Team
2021-04-08 09:05:26 Launchpad Janitor neutron (Ubuntu Bionic): status Fix Committed Fix Released
2021-04-12 13:53:08 Corey Bryant cloud-archive/queens: status Fix Committed Fix Released
2021-04-12 18:13:17 Corey Bryant cloud-archive/rocky: status Fix Committed Fix Released
2021-10-12 12:01:14 Launchpad Janitor merge proposal linked https://code.launchpad.net/~hopem/ubuntu/+source/neutron/+git/neutron/+merge/410051