Duplicate packets with two networks connected by router

Bug #1844915 reported by han
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Opinion
Undecided
Unassigned

Bug Description

The environment :

Rocky : 3 network + 3 controller + 2compute DVR with L3_HA

 When I use add port to router and update router , In order to connect 2 subnets

VM1 ping VM2:

    64 bytes from 172.16.1.10: seq=0 ttl=63 time=1.213 ms
    64 bytes from 172.16.1.10: seq=0 ttl=63 time=1.093 ms (DUP!)
    64 bytes from 172.16.1.10: seq=0 ttl=63 time=1.205 ms (DUP!)
    64 bytes from 172.16.1.10: seq=0 ttl=63 time=1.294 ms (DUP!)
    64 bytes from 172.16.1.10: seq=0 ttl=63 time=1.369 ms (DUP!)

Steps:

1. neutron port-create vlan954 ( vlan954 can map to a physical network )

2. openstack router port add vpc_connect e3e33741-56e5-4fa0-8b39-d0215eb080c9 (10.135.130.23)

3. openstack router set --route destination=172.16.1.0/24,gateway=10.135.130.106 vpc_connect

10.135.130.106 port and subnet (172.16.1.0) is in another OpenStack cluster( Don't use DVR )

4. In another OpenStack cluster ,Do something similar above

I guess the problem is the router port which is for network connection (10.135.130.23), Because I use floating IP ping 10.135.130.23 The same thing happened

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast qlen 1000
    link/ether fa:16:3e:19:78:61 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.10/24 brd 192.168.100.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe19:7861/64 scope link
       valid_lft forever preferred_lft forever

$
$ ping 10.135.130.23
PING 10.135.130.23 (10.135.130.23): 56 data bytes
64 bytes from 10.135.130.23: seq=0 ttl=63 time=0.690 ms
64 bytes from 10.135.130.23: seq=0 ttl=63 time=0.846 ms (DUP!)
64 bytes from 10.135.130.23: seq=0 ttl=63 time=0.944 ms (DUP!)
64 bytes from 10.135.130.23: seq=0 ttl=63 time=1.032 ms (DUP!)
64 bytes from 10.135.130.23: seq=0 ttl=63 time=1.096 ms (DUP!)
64 bytes from 10.135.130.23: seq=1 ttl=63 time=0.753 ms
64 bytes from 10.135.130.23: seq=1 ttl=63 time=0.865 ms (DUP!)
64 bytes from 10.135.130.23: seq=1 ttl=63 time=0.882 ms (DUP!)
64 bytes from 10.135.130.23: seq=1 ttl=63 time=0.894 ms (DUP!)
64 bytes from 10.135.130.23: seq=1 ttl=63 time=0.906 ms (DUP!)

[root@network3 ~]# neutron router-port-list vpc_connect
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+--------------------------------------+-------------------------------------------------+----------------------------------+-------------------+---------------------------------------------------------------------------------------+
| id | name | tenant_id | mac_address | fixed_ips |
+--------------------------------------+----------------------------------
| 7877c303-96e1-4a1f-8175-36e70cab4490 | | e2f2bedbdf794ad0a45cf37edc72252e | fa:16:3e:fe:8b:58 | {"subnet_id": "7d2d38c8-e091-4c4d-b814-ce036a7082a9", "ip_address": "172.31.17.1"} |
| 7ba36a6f-e6b8-427e-a8b8-2ea1cc6e6697 | HA port tenant e2f2bedbdf794ad0a45cf37edc72252e | | fa:16:3e:df:99:b9 | {"subnet_id": "6e26e835-e602-40eb-9242-e9b17148b8ea", "ip_address": "169.254.192.19"} |
| e3e33741-56e5-4fa0-8b39-d0215eb080c9 | connect_port | e2f2bedbdf794ad0a45cf37edc72252e | fa:16:3e:80:5b:bc | {"subnet_id": "d75231c6-7666-466b-960d-b6d8a9196648", "ip_address": "10.135.130.23"} |
| e713f26a-ebf8-4110-b43e-f940b4a6f6a6 | HA port tenant e2f2bedbdf794ad0a45cf37edc72252e | | fa:16:3e:95:ff:00 | {"subnet_id": "6e26e835-e602-40eb-9242-e9b17148b8ea", "ip_address": "169.254.192.3"} |
| f2042d10-86c1-4b0f-8fb5-6b55efa63ea7 | HA port tenant e2f2bedbdf794ad0a45cf37edc72252e | | fa:16:3e:1e:69:aa | {"subnet_id": "6e26e835-e602-40eb-9242-e9b17148b8ea", "ip_address": "169.254.192.2"} |

[root@network3 ~]# neutron router-show vpc_connect
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+-------------------------+---------------------------------------------------------------+
| Field | Value |
+-------------------------+---------------------------------------------------------------+
| admin_state_up | True |
| availability_zone_hints | |
| availability_zones | nova |
| created_at | 2019-09-22T02:54:38Z |
| description | |
| distributed | True |
| external_gateway_info | |
| flavor_id | |
| ha | True |
| id | f8871714-8987-45e8-a564-ede356c2564a |
| name | vpc_connect |
| project_id | e2f2bedbdf794ad0a45cf37edc72252e |
| revision_number | 8 |
| routes | {"destination": "172.16.1.0/24", "nexthop": "10.135.130.106"} |
| status | ACTIVE |
| tags | |
| tenant_id | e2f2bedbdf794ad0a45cf37edc72252e |
| updated_at | 2019-09-22T02:58:47Z |
+-------------------------+-----------------------------------------------

han (mr-bo)
summary: - DUP when Connect to 2 subnets
+ Get through two networks, there is DUP
description: updated
han (mr-bo)
description: updated
description: updated
han (mr-bo)
affects: openstack-community → neutron
description: updated
han (mr-bo)
description: updated
description: updated
han (mr-bo)
description: updated
description: updated
description: updated
description: updated
description: updated
han (mr-bo)
description: updated
description: updated
han (mr-bo)
description: updated
han (mr-bo)
description: updated
summary: - Get through two networks, there is DUP
+ Get through two networks by router, there is DUP
summary: - Get through two networks by router, there is DUP
+ Two networks connected by router, there is DUP
han (mr-bo)
summary: - Two networks connected by router, there is DUP
+ DUP with two networks connected by router
summary: - DUP with two networks connected by router
+ Duplicate with two networks connected by router
han (mr-bo)
summary: - Duplicate with two networks connected by router
+ Duplicate packets with two networks connected by router
han (mr-bo)
description: updated
tags: added: l3-dvr-backlog
Revision history for this message
YAMAMOTO Takashi (yamamoto) wrote :

is VM1 on the subnet d75231c6-7666-466b-960d-b6d8a9196648 ?

who is 192.168.100.10?

Changed in neutron:
status: New → Incomplete
Revision history for this message
Bence Romsics (bence-romsics) wrote :

Thank you for your bug report!

Could you please try to break down the reproduction steps in more detail? I find it hard to interpret which IP belongs to whom and to guess which objects have the various UUIDs in the commands. It would help to list the exact commands issued starting from a clear environment. Preferably with their full output. As Yamamoto asked where are you running 'ip a' and 'ping'?

Can you reproduce the problem without adding extra routes like '--route destination=172.16.1.0/24,gateway=10.135.130.106'?

I also recommend copy-pasting the commands and their outputs exactly as they were (router port add vs router add port) otherwise we may be debugging the recall of a command instead of the real bug.

Revision history for this message
han (mr-bo) wrote :

Summary about:

1. subnet create
2. openstack add router subnet
3. neutron port-create vlan954 (vlan954 can map to a physical network)
3. openstack router add port
4. openstack router set --route destination=172.16.1.0/24,gateway=10.135.130.106 routet_name
5. vm ping 172.16.1.7

10.135.130.0/24 is external

Revision history for this message
han (mr-bo) wrote :

In another OpenStack cluster ,Do something similar above, The two clusters have a common network (VLAN954)

Revision history for this message
han (mr-bo) wrote :

yes 192.168.100.10 is VM IP ,in subnet vpc_connect (f8871714-8987-45e8-a564-ede356c2564a) , d75231c6-7666-466b-960d-b6d8a9196648 is vlan954 (10.135.130.0/24)

Revision history for this message
han (mr-bo) wrote :

vm1(192.168.100.10)
vm2(172.16.1.7) ,In another OpenStack cluster,It's just simple cluster,not use dvr + Ha

Revision history for this message
han (mr-bo) wrote :

192.168.100.* -> 192.168.100.1 -> 10.130.135.23 (vlan954_port) -> 10.130.135.106 -> 172.16.1.1 -> 172.16.1.*

Revision history for this message
Bence Romsics (bence-romsics) wrote :

That information helped, but I am still not able to reproduce the duplicate responses.

In my test environment I have only one OpenStack cluster at the moment and I tried to minimize your description to make it fit into one cluster. I don't see how using two clusters would be essential for this bug reproduction.

First I reconfigured my master devstack for DVR. The relevant local.conf parts:

[[post-config|/etc/neutron/neutron.conf]]
[DEFAULT]
router_distributed = True

[[post-config|/etc/neutron/plugins/ml2/ml2_conf.ini]]
[agent]
enable_distributed_routing = True
l2_population = True

[ml2]
mechanism_drivers = openvswitch,linuxbridge,sriovnicswitch,l2population
tenant_network_types = vxlan,vlan

[ovs]
bridge_mappings = public:br-ex,physnet0:br-physnet0

[[post-config|/etc/neutron/l3_agent.ini]]
[DEFAULT]
agent_mode = dvr_snat

Then I tried to reproduce your setup by this (please tell me is this is the same or not, this was my best guess):

openstack network create private1 openstack subnet create private-subnet1 --network private1 --subnet-range 10.0.11.0/24

openstack network create private2
openstack subnet create private-subnet2 --network private2 --subnet-range 10.0.12.0/24

openstack network create public1 --provider-network-type vlan --provider-physical-network physnet0 --provider-segment 1000
openstack subnet create public-subnet1 --network public1 --subnet-range 10.0.20.0/24

# openstack router create router1 # already created by devstack
openstack router create router2

openstack router add subnet router1 private-subnet1
openstack router add subnet router2 private-subnet2

openstack port create port1 --network public1 --fixed-ip ip-address=10.0.20.11
openstack router add port router1 port1

openstack port create port2 --network public1 --fixed-ip ip-address=10.0.20.12
openstack router add port router2 port2

openstack router set router1 --route destination=10.0.12.0/24,gateway=10.0.20.12
openstack router set router2 --route destination=10.0.11.0/24,gateway=10.0.20.11

openstack server create vm1 --flavor cirros256 --image cirros-0.4.0-x86_64-disk --nic net-id=private1,v4-fixed-ip=10.0.11.10 --wait
openstack server create vm2 --flavor cirros256 --image cirros-0.4.0-x86_64-disk --nic net-id=private2,v4-fixed-ip=10.0.12.10 --wait

Then I logged in to vm1 (having address 10.0.11.10) and pinged every other address in this chain:

10.0.11.10 - 10.0.11.1 - 10.0.20.11 - 10.0.20.12 - 10.0.12.1 - 10.0.12.10

All of them responded, but for none did I receive duplicates.

Revision history for this message
han (mr-bo) wrote :

Your understanding of the operation is correct, but But is your environment a single node ? Preferably three network nodes and two compute nodes

I refer to this link configuration Distributed Virtual Routing with VRRP :
https://docs.openstack.org/neutron/rocky/admin/config-dvr-ha-snat.html#config-dvr-snat-ha-ovs

Revision history for this message
Bence Romsics (bence-romsics) wrote :

Yes, I have a single node at the moment. It's unlikely I'll have 5 nodes to build a 3+2 environment for this reproduction.

Can you please try to disable your nodes one by one to find out what is the minimal number of network and compute nodes until you still see the duplicate answers? 1+1, 1+2, 2+1? Knowing that would definitely give us an idea from whom the duplicate responses are originating from. Plus having a minimal reproduction environment would help working with this bug.

han (mr-bo)
Changed in neutron:
status: Incomplete → Opinion
Revision history for this message
Josselin Mouette (jmouette) wrote :

Hi, this bug is quite old but it is definitely the same thing we are able to reproduce in our production setup.

This happens when you route a provider network (mapped to a physical VLAN) with a DVR, and connect it to a physical infrastructure with baremetal nodes (managed by ironic). The DVR is the gateway for baremetal nodes. It has a serious impact on performance and prevents the use of DVR for this use case.

My analysis follows:
 - on each node (net+compute) where the DVR has an instance installed, the install_dvr_process method installs an openvswitch rule to rewrite the outgoing MAC address in the physical bridge (br-ex)
 - the baremetal node sends an ARP request looking for the MAC of its gateway
 - *each DVR instance* receives it and replies with the virtual (common) MAC for the DVR
 - this reply is rewritten by the OVS rule but only the source MAC field is rewritten

After the rewrite, the ARP packet looks as follows:
 09:05:20.358550 fa:16:3f:53:f7:87 > 40:f2:e9:04:90:aa, ethertype ARP (0x0806), length 60: Reply 192.168.201.1 is-at fa:16:3e:fc:87:ff, length 46
(Here, fc87ff is the virtual MAC, 53f787 is the DVR MAC for that host.)

 - the baremetal node receives several ARP responses, all containing the same answer, and updates its ARP table to point to the virtual MAC
 - however, the *switches* on the path have no idea where that virtual MAC lies. All they see is the originating MAC from the DVR instance
 - the baremetal sends a packet to its gateway, using the virtual MAC as destination
 - the switches, not knowing where that MAC lies, *flood* all ports mapped to the VLAN with the reply
 - eventually, all DVR instances receive the packet and act accordingly. From there come our duplicate packets.

I can think of several ways to fix this but none ideal.

1. Remove MAC rewriting for physical bridges altogether. In our setup, it works perfectly, but looking at the git history makes me fear that other setups heavily rely on this. Depending on the layer 2 configuration, this might be frowned upon because the switches will see macmoves all day long.
2. Add to the MAC rewriting an ARP rewriting, having each DVR send its own DVR MAC for the gateway and letting the remote host choose. This also requires a reverse rewriting rule for received packets, which would now have the DVR MAC as destination. I can’t help thinking this would break some other setups as well.
3. Make that MAC rewriting on physical bridges opt-out (or the additional ARP rewriting opt-in), with an option in neutron.conf. This is the only way that guarantees no setup will be broken upon upgrade.

I’m not sure how DVR is supposed to work in such a setup: are hosts outside neutron agents supposed to see the individual DVR MACs? Looking at the blueprint, it seems to me that no rewriting was originally supposed to happen in the physical bridge, but it ended up being required. I would much appreciate some insight before proposing a patch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.