get_subnet_for_dvr returns SNAT mac instead of gateway in subnet_info

Bug #1783470 reported by Arjun Baindur
26
This bug affects 6 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Swaminathan Vasudevan

Bug Description

On our dvr_snat host, the "install_dvr_to_src_mac" is installing the rule in br-int with the SNAT MAC instead instead of the DVR mac address (subnet's gateway aka network:router_interface_distributed). For example, the subnet's gateway is 172.16.0.1, with MAC fa:16:3e:42:a2:ec.

On most hosts, we see following rules in br-int:

[root@stan ~]# ovs-ofctl dump-flows br-int | grep fa:16:3e:42:a2:ec
 cookie=0x77f69fee58f51737, duration=11872.801s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:22:eb:8b actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
 cookie=0x77f69fee58f51737, duration=11872.790s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:cd:71:e1 actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
 cookie=0x77f69fee58f51737, duration=11865.953s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:20:77:00 actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
 cookie=0x77f69fee58f51737, duration=11865.933s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:ab:2d:1a actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
 cookie=0x77f69fee58f51737, duration=11860.735s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:76:e9:ae actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
 cookie=0x77f69fee58f51737, duration=11859.335s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:cb:48:27 actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)

However, on our dvr_snat host, these rules are all missing for the dl_src MAC. Instead, they get added with the MAC of the network:router_centralized_snat instead:

root@krusty:~# ovs-ofctl dump-flows br-int | grep fa:16:3e:84:0b:42
 cookie=0xbb5ebbfa2dfadb74, duration=5351.368s, table=2, n_packets=2976001, n_bytes=362273213, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:84:0b:42 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
 cookie=0xbb5ebbfa2dfadb74, duration=5195.362s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:86:91:e2 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
 cookie=0xbb5ebbfa2dfadb74, duration=5195.349s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:a2:04:d3 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
 cookie=0xbb5ebbfa2dfadb74, duration=5195.336s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:82:ef:3b actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
 cookie=0xbb5ebbfa2dfadb74, duration=5195.325s, table=2, n_packets=24, n_bytes=2044, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:e4:d9:f3 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
 cookie=0xbb5ebbfa2dfadb74, duration=5195.272s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:b9:a0:fe actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
 cookie=0xbb5ebbfa2dfadb74, duration=5194.118s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:1a:42:fa actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
 cookie=0xbb5ebbfa2dfadb74, duration=5194.098s, table=2, n_packets=56, n_bytes=4792, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:84:33:df actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
 cookie=0xbb5ebbfa2dfadb74, duration=5193.995s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:34:e1:92 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
 cookie=0xbb5ebbfa2dfadb74, duration=5193.509s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:6d:3e:f3 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
 cookie=0xbb5ebbfa2dfadb74, duration=5191.408s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:30:97:8f actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
 cookie=0xbb5ebbfa2dfadb74, duration=5188.895s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:57:e5:ad actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
 cookie=0xbb5ebbfa2dfadb74, duration=5351.361s, table=60, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:84:0b:42 actions=strip_vlan,output:951
root@krusty:~#

I have traced this to the get_subnet_for_dvr call. In the subnet_info, the gateway_mac returned is incorrect. Initially upon restarting OVS agent, the dvr_local_map is empty. So OVS agent makes the get_subnet_for_dvr call to populate local subnet info map. On good hosts, it is querying with fixed_ip = subnet gateway (172.16.0.1). On the snat host, it is querying first with fixed_ip = 172.16.0.3.

Either this is incorrect, or even when querying with SNAT port, the gateway_mac in subnet should be DVR MAC, not snat MAC:

Good host:

root@barney:~# cat ovs.log | grep get_subnet_for_dvr | grep "172.16"
2018-07-24 19:42:24.454 15840 DEBUG neutron.api.rpc.handlers.dvr_rpc [req-5ece05d6-f2cd-46a4-b81e-7e579e61990b - - - - -] neutron.api.rpc.handlers.dvr_rpc.DVRServerRpcApi method get_subnet_for_dvr called with arguments (<neutron_lib.context.ContextBase object at 0x7f52f1983150>, '3707b250-b6f5-4701-9b17-01a8f288c17a') {'fixed_ips': [{'subnet_id': '3707b250-b6f5-4701-9b17-01a8f288c17a', 'ip_address': '172.16.0.1'}]} wrapper /opt/pf9/pf9-neutron/lib/python2.7/site-packages/oslo_log/helpers.py:66
2018-07-24 19:42:24.820 15840 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent [req-5ece05d6-f2cd-46a4-b81e-7e579e61990b - - - - -] get_subnet_for_dvr for subnet 3707b250-b6f5-4701-9b17-01a8f288c17a returned with {u'shared': True, u'service_types': [], u'description': None, u'enable_dhcp': True, u'tags': [], u'network_id': u'3f6ec232-7649-4639-b828-c3af9960481b', u'tenant_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'dns_nameservers': [u'10.1.10.19', u'8.8.8.8', u'8.8.4.4'], u'gateway_ip': u'172.16.0.1', u'ipv6_ra_mode': None, u'allocation_pools': [{u'start': u'172.16.0.2', u'end': u'172.16.255.254'}], u'host_routes': [], u'revision_number': 2, u'ipv6_address_mode': None, u'ip_version': 4, u'gateway_mac': u'fa:16:3e:42:a2:ec', u'cidr': u'172.16.0.0/16', u'project_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'id': u'3707b250-b6f5-4701-9b17-01a8f288c17a', u'subnetpool_id': None, u'name': u'172.16.0.0/16'} _bind_distributed_router_interface_port /opt/pf9/pf9-neutron/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py:371
2018-07-24 19:42:25.686 15840 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent [req-5ece05d6-f2cd-46a4-b81e-7e579e61990b - - - - -] get_subnet_for_dvr for subnet 98d2750d-60ce-4b53-88ef-423b77d5f5f5 returned with {u'shared': True, u'service_types': [], u'description': None, u'enable_dhcp': True, u'tags': [], u'network_id': u'655c3eb4-b9f5-4e30-92de-2262d6e87c92', u'tenant_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'dns_nameservers': [], u'gateway_ip': u'10.100.0.1', u'ipv6_ra_mode': None, u'allocation_pools': [{u'start': u'10.100.0.2', u'end': u'10.100.255.254'}], u'host_routes': [{u'destination': u'0.0.0.0/0', u'nexthop': u'172.16.0.1'}], u'revision_number': 0, u'ipv6_address_mode': None, u'ip_version': 4, u'gateway_mac': u'fa:16:3e:13:61:98', u'cidr': u'10.100.0.0/16', u'project_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'id': u'98d2750d-60ce-4b53-88ef-423b77d5f5f5', u'subnetpool_id': None, u'name': u'dogfood-vxlan-8000-sub'} _bind_distributed_router_interface_port /opt/pf9/pf9-neutron/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py:371

Bad Host:
root@krusty:~# cat ovs.log | grep get_subnet_for_dvr | grep "172.16"
2018-07-24 19:44:44.135 31138 DEBUG neutron.api.rpc.handlers.dvr_rpc [req-6d269f17-c49c-4f64-93f8-139639020c5d - - - - -] neutron.api.rpc.handlers.dvr_rpc.DVRServerRpcApi method get_subnet_for_dvr called with arguments (<neutron_lib.context.ContextBase object at 0x7f1c09d3b410>, '3707b250-b6f5-4701-9b17-01a8f288c17a') {'fixed_ips': [{'subnet_id': '3707b250-b6f5-4701-9b17-01a8f288c17a', 'ip_address': '172.16.0.3'}]} wrapper /opt/pf9/pf9-neutron/lib/python2.7/site-packages/oslo_log/helpers.py:66
2018-07-24 19:44:44.369 31138 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent [req-6d269f17-c49c-4f64-93f8-139639020c5d - - - - -] get_subnet_for_dvr for subnet 3707b250-b6f5-4701-9b17-01a8f288c17a returned with {u'shared': True, u'service_types': [], u'description': None, u'enable_dhcp': True, u'tags': [], u'network_id': u'3f6ec232-7649-4639-b828-c3af9960481b', u'tenant_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'dns_nameservers': [u'10.1.10.19', u'8.8.8.8', u'8.8.4.4'], u'gateway_ip': u'172.16.0.1', u'ipv6_ra_mode': None, u'allocation_pools': [{u'start': u'172.16.0.2', u'end': u'172.16.255.254'}], u'host_routes': [], u'revision_number': 2, u'ipv6_address_mode': None, u'ip_version': 4, u'gateway_mac': u'fa:16:3e:84:0b:42', u'cidr': u'172.16.0.0/16', u'project_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'id': u'3707b250-b6f5-4701-9b17-01a8f288c17a', u'subnetpool_id': None, u'name': u'172.16.0.0/16'} _bind_centralized_snat_port_on_dvr_subnet /opt/pf9/pf9-neutron/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py:553
2018-07-24 19:44:51.786 31138 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent [req-6d269f17-c49c-4f64-93f8-139639020c5d - - - - -] get_subnet_for_dvr for subnet 98d2750d-60ce-4b53-88ef-423b77d5f5f5 returned with {u'shared': True, u'service_types': [], u'description': None, u'enable_dhcp': True, u'tags': [], u'network_id': u'655c3eb4-b9f5-4e30-92de-2262d6e87c92', u'tenant_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'dns_nameservers': [], u'gateway_ip': u'10.100.0.1', u'ipv6_ra_mode': None, u'allocation_pools': [{u'start': u'10.100.0.2', u'end': u'10.100.255.254'}], u'host_routes': [{u'destination': u'0.0.0.0/0', u'nexthop': u'172.16.0.1'}], u'revision_number': 0, u'ipv6_address_mode': None, u'ip_version': 4, u'gateway_mac': u'fa:16:3e:b1:bd:33', u'cidr': u'10.100.0.0/16', u'project_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'id': u'98d2750d-60ce-4b53-88ef-423b77d5f5f5', u'subnetpool_id': None, u'name': u'dogfood-vxlan-8000-sub'} _bind_centralized_snat_port_on_dvr_subnet /opt/pf9/pf9-neutron/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py:553

This causes a whole slew of problems - packets are sent into network infrastructure with src MAC of the local DVR mac, causing this MAC to flap on remote hosts' br-int between patch cable and qr interface. If we shut the snat host's interfaces or bring the host down, the dvr MAC stops flapping on br-int on other hosts, and network connectivity is restored.

Revision history for this message
Arjun Baindur (abaindur) wrote :

The problem appears to be the internal_gateway_ports returned by self.plugin.get_ports() when called with a filter of the SNAT port. On our snat host, this is the very first port that is used when querying the subnet info.

The subnet is the same, as is the gateway, so why is the gateway_mac returned different depending on query port?

2018-07-25 05:26:25.643 1033 INFO neutron.db.dvr_mac_db [req-d69336a0-3db0-4e8a-b24f-b0446c5d9735 - - - - -] ARJUN: get_subnet_for_dvr: filter = {'fixed_ips': {'subnet_id': [u'3707b250-b6f5-4701-9b17-01a8f288c17a'], 'ip_address': [u'172.16.0.3']}}
2018-07-25 05:26:25.766 1033 INFO neutron.db.dvr_mac_db [req-d69336a0-3db0-4e8a-b24f-b0446c5d9735 - - - - -] ARJUN: internal_gateway_ports = [{'status': u'ACTIVE', 'dns_name': '', 'binding:host_id': u'25f77b4e-1f9d-45c3-937e-49eed272faaf', 'description': None, 'allowed_address_pairs': [], 'tags': [], 'extra_dhcp_opts': [], 'dns_assignment': [{'hostname': u'host-172-16-0-3', 'ip_address': u'172.16.0.3', 'fqdn': u'host-172-16-0-3.platform9.sys.'}], 'device_owner': u'network:router_centralized_snat', 'revision_number': 242L, 'port_security_enabled': True, 'binding:profile': {}, 'fixed_ips': [{'subnet_id': u'3707b250-b6f5-4701-9b17-01a8f288c17a', 'ip_address': u'172.16.0.3'}], 'id': u'bc6797ed-2aa8-4764-b42f-852438d21bd5', 'security_groups': [], 'device_id': u'37176403-cfb0-478d-b51c-971d89597cf5', 'name': u'', 'admin_state_up': True, 'network_id': u'3f6ec232-7649-4639-b828-c3af9960481b', 'tenant_id': u'', 'binding:vif_details': {u'port_filter': True, u'ovs_hybrid_plug': True}, 'binding:vnic_type': u'normal', 'binding:vif_type': u'ovs', 'mac_address': u'fa:16:3e:84:0b:42', 'project_id': u''}]
2018-07-25 05:26:25.767 1033 INFO neutron.db.dvr_mac_db [req-d69336a0-3db0-4e8a-b24f-b0446c5d9735 - - - - -] ARJUN: using internal_port['mac_address'] = {'status': u'ACTIVE', 'dns_name': '', 'binding:host_id': u'25f77b4e-1f9d-45c3-937e-49eed272faaf', 'description': None, 'allowed_address_pairs': [], 'tags': [], 'extra_dhcp_opts': [], 'dns_assignment': [{'hostname': u'host-172-16-0-3', 'ip_address': u'172.16.0.3', 'fqdn': u'host-172-16-0-3.platform9.sys.'}], 'device_owner': u'network:router_centralized_snat', 'revision_number': 242L, 'port_security_enabled': True, 'binding:profile': {}, 'fixed_ips': [{'subnet_id': u'3707b250-b6f5-4701-9b17-01a8f288c17a', 'ip_address': u'172.16.0.3'}], 'id': u'bc6797ed-2aa8-4764-b42f-852438d21bd5', 'security_groups': [], 'device_id': u'37176403-cfb0-478d-b51c-971d89597cf5', 'name': u'', 'admin_state_up': True, 'network_id': u'3f6ec232-7649-4639-b828-c3af9960481b', 'tenant_id': u'', 'binding:vif_details': {u'port_filter': True, u'ovs_hybrid_plug': True}, 'binding:vnic_type': u'normal', 'binding:vif_type': u'ovs', 'mac_address': u'fa:16:3e:84:0b:42', 'project_id': u''}

Revision history for this message
Arjun Baindur (abaindur) wrote :
Download full text (3.4 KiB)

2018-07-25 05:26:25.525 1033 INFO neutron.db.dvr_mac_db [req-d69336a0-3db0-4e8a-b24f-b0446c5d9735 - - - - -] ARJUN: get_subnet_for_dvr: fixed_ips = [{u'subnet_id': u'3707b250-b6f5-4701-9b17-01a8f288c17a', u'ip_address': u'172.16.0.3'}] subnet_info = 3707b250-b6f5-4701-9b17-01a8f288c17a
2018-07-25 05:26:25.642 1033 INFO neutron.db.dvr_mac_db [req-d69336a0-3db0-4e8a-b24f-b0446c5d9735 - - - - -] ARJUN: got subnet_info: {'description': None, 'tags': [], 'ipv6_ra_mode': None, 'allocation_pools': [{'start': u'172.16.0.2', 'end': u'172.16.255.254'}], 'host_routes': [], 'revision_number': 2L, 'ipv6_address_mode': None, 'cidr': u'172.16.0.0/16', 'id': u'3707b250-b6f5-4701-9b17-01a8f288c17a', 'subnetpool_id': None, 'service_types': [], 'name': u'172.16.0.0/16', 'enable_dhcp': True, 'network_id': u'3f6ec232-7649-4639-b828-c3af9960481b', 'tenant_id': u'f175f441ebbb4c2b8fedf6469d6415fc', 'dns_nameservers': [u'10.1.10.19', u'8.8.8.8', u'8.8.4.4'], 'gateway_ip': u'172.16.0.1', 'ip_version': 4L, 'shared': True, 'project_id': u'f175f441ebbb4c2b8fedf6469d6415fc'}
2018-07-25 05:26:25.643 1033 INFO neutron.db.dvr_mac_db [req-d69336a0-3db0-4e8a-b24f-b0446c5d9735 - - - - -] ARJUN: get_subnet_for_dvr: filter = {'fixed_ips': {'subnet_id': [u'3707b250-b6f5-4701-9b17-01a8f288c17a'], 'ip_address': [u'172.16.0.3']}}
2018-07-25 05:26:25.766 1033 INFO neutron.db.dvr_mac_db [req-d69336a0-3db0-4e8a-b24f-b0446c5d9735 - - - - -] ARJUN: internal_gateway_ports = [{'status': u'ACTIVE', 'dns_name': '', 'binding:host_id': u'25f77b4e-1f9d-45c3-937e-49eed272faaf', 'description': None, 'allowed_address_pairs': [], 'tags': [], 'extra_dhcp_opts': [], 'dns_assignment': [{'hostname': u'host-172-16-0-3', 'ip_address': u'172.16.0.3', 'fqdn': u'host-172-16-0-3.platform9.sys.'}], 'device_owner': u'network:router_centralized_snat', 'revision_number': 242L, 'port_security_enabled': True, 'binding:profile': {}, 'fixed_ips': [{'subnet_id': u'3707b250-b6f5-4701-9b17-01a8f288c17a', 'ip_address': u'172.16.0.3'}], 'id': u'bc6797ed-2aa8-4764-b42f-852438d21bd5', 'security_groups': [], 'device_id': u'37176403-cfb0-478d-b51c-971d89597cf5', 'name': u'', 'admin_state_up': True, 'network_id': u'3f6ec232-7649-4639-b828-c3af9960481b', 'tenant_id': u'', 'binding:vif_details': {u'port_filter': True, u'ovs_hybrid_plug': True}, 'binding:vnic_type': u'normal', 'binding:vif_type': u'ovs', 'mac_address': u'fa:16:3e:84:0b:42', 'project_id': u''}]
2018-07-25 05:26:25.767 1033 INFO neutron.db.dvr_mac_db [req-d69336a0-3db0-4e8a-b24f-b0446c5d9735 - - - - -] ARJUN: using internal_port['mac_address'] = {'status': u'ACTIVE', 'dns_name': '', 'binding:host_id': u'25f77b4e-1f9d-45c3-937e-49eed272faaf', 'description': None, 'allowed_address_pairs': [], 'tags': [], 'extra_dhcp_opts': [], 'dns_assignment': [{'hostname': u'host-172-16-0-3', 'ip_address': u'172.16.0.3', 'fqdn': u'host-172-16-0-3.platform9.sys.'}], 'device_owner': u'network:router_centralized_snat', 'revision_number': 242L, 'port_security_enabled': True, 'binding:profile': {}, 'fixed_ips': [{'subnet_id': u'3707b250-b6f5-4701-9b17-01a8f288c17a', 'ip_address': u'172.16.0.3'}], 'id': u'bc6797ed-2aa8-4764-b42f-852438d21bd5', 'security_groups': [], 'device_id': u'371764...

Read more...

Revision history for this message
Arjun Baindur (abaindur) wrote :

Should we never have _bind_centralized_snat_port_on_dvr_subnet() call get_subnet_for_dvr() to populate the subnet_info into the local_dvr_map here? https://github.com/openstack/neutron/blob/stable/pike/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L534

The server side handle for this returns the MAC as subnet_info['gateway_mac'] = internal_port['mac_address']. If this happens first, then the subnet_info is cached forever with the SNAT port's MAC: https://github.com/openstack/neutron/blob/stable/pike/neutron/db/dvr_mac_db.py#L200

Instead, the get_subnet_for_dvr should only be populated via _bind_distributed_router_interface_port(), as this invokes the RPC call w/ the correct port, the distributed gateway with the DVR MAC: https://github.com/openstack/neutron/blob/stable/pike/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L360

If there is an explicit need to populate the local subnet when processing the SNAT port, the server side handler should find a way to return the DVR mac always, not the port's MAC address

Revision history for this message
YAMAMOTO Takashi (yamamoto) wrote :

is this pike?

Revision history for this message
Arjun Baindur (abaindur) wrote :

Yes, Pike. This code too, appears the same in master/queens. I fixed it locally by setting fixed_ips=None in the call to self.plugin_rpc.get_subnet_for_dvr. After that, the dvr_to_src_mac flows are added with correct MAC.

BTW this issue seems to be random(ordering of distributed vs snat port). I observed a dvr_snat node on another env where the flows were added with the router_distributed port's MAC. But then when I restarted ovs-agent, the _bind_centralized_snat_port_on_dvr_subnet handler was invoked first for the SNAT port, and so the subnet_info used the MAC from this port instead.

Changed fixed_ips to =None here: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L541

    def _bind_centralized_snat_port_on_dvr_subnet(self, port, lvm,
                                                  fixed_ips, device_owner):
        # since centralized-SNAT (CSNAT) port must have only one fixed
        # IP, directly use fixed_ips[0]
        fixed_ip = fixed_ips[0]
        if port.vif_id in self.local_ports:
            # throw an error if CSNAT port is already on a different
            # dvr routed subnet
            ovsport = self.local_ports[port.vif_id]
            subs = list(ovsport.get_subnets())
            if subs[0] == fixed_ip['subnet_id']:
                return
            LOG.error("Centralized-SNAT port %(port)s on subnet "
                      "%(port_subnet)s already seen on a different "
                      "subnet %(orig_subnet)s", {
                          "port": port.vif_id,
                          "port_subnet": fixed_ip['subnet_id'],
                          "orig_subnet": subs[0],
                      })
            return
        subnet_uuid = fixed_ip['subnet_id']
        ldm = None
        subnet_info = None
        if subnet_uuid not in self.local_dvr_map:
            # no csnat ports seen on this subnet - create csnat state
            # for this subnet
            subnet_info = self.plugin_rpc.get_subnet_for_dvr(
                self.context, subnet_uuid, fixed_ips=None)
            if not subnet_info:
                LOG.warning("DVR: Unable to retrieve subnet information "
                            "for subnet_id %s. The subnet or the gateway "
                            "may have already been deleted", subnet_uuid)
                return
            LOG.debug("get_subnet_for_dvr for subnet %(uuid)s "
                      "returned with %(info)s",
                      {"uuid": subnet_uuid, "info": subnet_info})
            ldm = LocalDVRSubnetMapping(subnet_info, port.ofport)
            self.local_dvr_map[subnet_uuid] = ldm
        else:
            ldm = self.local_dvr_map[subnet_uuid]
            subnet_info = ldm.get_subnet_info()
            # Store csnat OF Port in the existing DVRSubnetMap
            ldm.set_csnat_ofport(port.ofport)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/587234

Changed in neutron:
assignee: nobody → Arjun Baindur (abaindur)
status: New → In Progress
Changed in neutron:
assignee: Arjun Baindur (abaindur) → Swaminathan Vasudevan (swaminathan-vasudevan)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/587234
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c6de172e58ed4cbd157c2e560ffbbb4dc3a34730
Submitter: Zuul
Branch: master

commit c6de172e58ed4cbd157c2e560ffbbb4dc3a34730
Author: Arjun Baindur <email address hidden>
Date: Mon Jul 30 14:22:30 2018 -0700

    get_subnet_for_dvr returns SNAT mac instead of distributed gateway in subnet_info

    On hosts with dvr_snat agent mode, after restarting OVS agent,
    sometimes the SNAT port is processed first instead of the distributed port.
    The subnet_info is cached locally via get_subnet_for_dvr when either of these ports
    are processed. However, it returns the MAC address of the port used to query
    as the gateway for the subnet. Using the SNAT port, this puts the wrong
    MAC as the gateway, causing some flows such as the DVR flows on br-int
    for local src VMs to have the wrong MAC.

    This patch fixes the get_subnet_for_dvr with fixed_ips as None for the csnat port,
    as that causes the server side handler to fill in the subnet's actual gateway
    rather than using the port's MAC.

    Change-Id: If045851819fd53c3b9a1506cc52bc1757e6d6851
    Closes-Bug: #1783470

Changed in neutron:
status: In Progress → Fix Released
tags: added: neutron-proactive-backport-potential
tags: added: neutron-easy-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/593002

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/593013

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/593019

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/593026

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.openstack.org/593002
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=dd915e8238188ca8609292d50d31e1c840f0fdb5
Submitter: Zuul
Branch: stable/rocky

commit dd915e8238188ca8609292d50d31e1c840f0fdb5
Author: Arjun Baindur <email address hidden>
Date: Mon Jul 30 14:22:30 2018 -0700

    get_subnet_for_dvr returns SNAT mac instead of distributed gateway in subnet_info

    On hosts with dvr_snat agent mode, after restarting OVS agent,
    sometimes the SNAT port is processed first instead of the distributed port.
    The subnet_info is cached locally via get_subnet_for_dvr when either of these ports
    are processed. However, it returns the MAC address of the port used to query
    as the gateway for the subnet. Using the SNAT port, this puts the wrong
    MAC as the gateway, causing some flows such as the DVR flows on br-int
    for local src VMs to have the wrong MAC.

    This patch fixes the get_subnet_for_dvr with fixed_ips as None for the csnat port,
    as that causes the server side handler to fill in the subnet's actual gateway
    rather than using the port's MAC.

    Change-Id: If045851819fd53c3b9a1506cc52bc1757e6d6851
    Closes-Bug: #1783470
    (cherry picked from commit c6de172e58ed4cbd157c2e560ffbbb4dc3a34730)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.0.0rc2

This issue was fixed in the openstack/neutron 13.0.0.0rc2 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/593013
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b600368d8f35cad28ead132f44b171ed1387b20a
Submitter: Zuul
Branch: stable/queens

commit b600368d8f35cad28ead132f44b171ed1387b20a
Author: Arjun Baindur <email address hidden>
Date: Mon Jul 30 14:22:30 2018 -0700

    get_subnet_for_dvr returns SNAT mac instead of distributed gateway in subnet_info

    On hosts with dvr_snat agent mode, after restarting OVS agent,
    sometimes the SNAT port is processed first instead of the distributed port.
    The subnet_info is cached locally via get_subnet_for_dvr when either of these ports
    are processed. However, it returns the MAC address of the port used to query
    as the gateway for the subnet. Using the SNAT port, this puts the wrong
    MAC as the gateway, causing some flows such as the DVR flows on br-int
    for local src VMs to have the wrong MAC.

    This patch fixes the get_subnet_for_dvr with fixed_ips as None for the csnat port,
    as that causes the server side handler to fill in the subnet's actual gateway
    rather than using the port's MAC.

    Change-Id: If045851819fd53c3b9a1506cc52bc1757e6d6851
    Closes-Bug: #1783470
    (cherry picked from commit c6de172e58ed4cbd157c2e560ffbbb4dc3a34730)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ocata)

Reviewed: https://review.openstack.org/593026
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9c1d2c68fbf038ed2746762893f7ab8cc23dee11
Submitter: Zuul
Branch: stable/ocata

commit 9c1d2c68fbf038ed2746762893f7ab8cc23dee11
Author: Arjun Baindur <email address hidden>
Date: Mon Jul 30 14:22:30 2018 -0700

    get_subnet_for_dvr returns SNAT mac instead of distributed gateway in subnet_info

    On hosts with dvr_snat agent mode, after restarting OVS agent,
    sometimes the SNAT port is processed first instead of the distributed port.
    The subnet_info is cached locally via get_subnet_for_dvr when either of these ports
    are processed. However, it returns the MAC address of the port used to query
    as the gateway for the subnet. Using the SNAT port, this puts the wrong
    MAC as the gateway, causing some flows such as the DVR flows on br-int
    for local src VMs to have the wrong MAC.

    This patch fixes the get_subnet_for_dvr with fixed_ips as None for the csnat port,
    as that causes the server side handler to fill in the subnet's actual gateway
    rather than using the port's MAC.

    Change-Id: If045851819fd53c3b9a1506cc52bc1757e6d6851
    Closes-Bug: #1783470
    (cherry picked from commit c6de172e58ed4cbd157c2e560ffbbb4dc3a34730)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/593019
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=cefafe3cc30daf116802e772f4162f2ac35d0c18
Submitter: Zuul
Branch: stable/pike

commit cefafe3cc30daf116802e772f4162f2ac35d0c18
Author: Arjun Baindur <email address hidden>
Date: Mon Jul 30 14:22:30 2018 -0700

    get_subnet_for_dvr returns SNAT mac instead of distributed gateway in subnet_info

    On hosts with dvr_snat agent mode, after restarting OVS agent,
    sometimes the SNAT port is processed first instead of the distributed port.
    The subnet_info is cached locally via get_subnet_for_dvr when either of these ports
    are processed. However, it returns the MAC address of the port used to query
    as the gateway for the subnet. Using the SNAT port, this puts the wrong
    MAC as the gateway, causing some flows such as the DVR flows on br-int
    for local src VMs to have the wrong MAC.

    This patch fixes the get_subnet_for_dvr with fixed_ips as None for the csnat port,
    as that causes the server side handler to fill in the subnet's actual gateway
    rather than using the port's MAC.

    Change-Id: If045851819fd53c3b9a1506cc52bc1757e6d6851
    Closes-Bug: #1783470
    (cherry picked from commit c6de172e58ed4cbd157c2e560ffbbb4dc3a34730)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.4

This issue was fixed in the openstack/neutron 12.0.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.6

This issue was fixed in the openstack/neutron 11.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.0.0b1

This issue was fixed in the openstack/neutron 14.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron ocata-eol

This issue was fixed in the openstack/neutron ocata-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.