MLDv2 packets sent from L3 Agent managed networks cause backup routers to be preferred

Bug #2049909 reported by Andrew Bonney
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Medium
Unassigned

Bug Description

Neutron 2023.1 29cc1a634e530972614c09fbb212b5f63fd4c374
Ubuntu 20.04

This issue has been identified in a Neutron system running Linux Bridge networking, but whilst this may no longer be supported I'm posting it in case the same issue might be relevant for other drivers.

When running multiple network nodes, the tenant networks in the namespaces on each node share MAC addresses. We have noted that particularly when rebooting a network node, traffic from the Internet to tenant networks can be disrupted when a node which was acting as the backup for a given tenant network comes back online. We have traced this to Linux sending out MLDv2 responses to the upstream switches when the tenant network processes (keepalived etc) start up. As a result, the upstream switches update their MAC tables to prefer that host despite it not being the primary. If there is minimal tenant traffic (such as when running a web server), this network will be inaccessible from the outside until a request is made from the inside to the outside and the switches re-update their MAC tables to reflect the correct state.

There is already handling to prevent some IPv6 packets being sent out in these cases here: https://opendev.org/openstack/neutron/src/branch/master/neutron/agent/l3/router_info.py#L808, and there is theoretically something explicitly referencing issues with MLDv2 in the same area: https://opendev.org/openstack/neutron/src/branch/master/neutron/agent/l3/router_info.py#L813. Unfortunately these don't appear to be sufficient.

There is no sysctl mechanism to prevent these gratuitous MLDv2 responses as far as I can tell, so we are working around this by using iptables rules inserted by the L3 agent into tenant networks. There may well be a better solution, but I will link our workaround to this bug report shortly.

Tags: l3-ha
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/906114

Revision history for this message
Brian Haley (brian-haley) wrote :

Copied from the review:

I think we tried using iptables in PS1 here: https://review.opendev.org/c/openstack/neutron/+/836198/1 but it didn't work as expected.

The problem is the kernel sends these packets when forwarding changes on an interface, hence the comment above. We've had many bugs [0][1] and tried many fixes which seem to work, but something is missed.

The change https://review.opendev.org/c/openstack/neutron/+/707406 really should have addressed this problem I believe, but I wonder if Linuxbridge is different here than ML2/OVS?

[0] https://bugs.launchpad.net/neutron/+bug/1787919
[1] https://bugs.launchpad.net/neutron/+bug/1859832

tags: added: l3-ha
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Andrew Bonney (andrewbonney) wrote :

Thanks for the comments. I think I'd be surprised if we were the first to encounter an issue here, so I appreciate the related links. I've commented further in the review.

Revision history for this message
Andrew Bonney (andrewbonney) wrote (last edit ):

Following a request for more info during the last IRC meeting, here is some tcpdump output taken without the iptables patch present. This is taken from the backup node, performing a tcpdump in a sample tenant namespace against the qg interface as shown.

5: qg-4f76f2f2-d7@if156: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1550 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:23:0a:4d brd ff:ff:ff:ff:ff:ff link-netnsid 0

13:16:44.642662 fa:16:3e:23:0a:4d (oui Unknown) > 33:33:00:00:00:16 (oui Unknown), ethertype IPv6 (0x86dd), length 110: :: > ff02::16: HBH ICMP6, multicast listener report v2, 2 group record(s), length 48
13:16:45.614658 fa:16:3e:23:0a:4d (oui Unknown) > 33:33:00:00:00:16 (oui Unknown), ethertype IPv6 (0x86dd), length 110: :: > ff02::16: HBH ICMP6, multicast listener report v2, 2 group record(s), length 48

Whilst there is other IPv6 ICMP traffic seen, this only comes from the primary, and only from the addresses which live there. The standby interface shows no global or link local addressing.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/906114
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.