Regression: ip6 ndp broken, host bridge doesn't add vlan guest entry to mdb

Bug #1959702 reported by Harry Coin
262
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Starting at the end: I believe as the bug presently requires each of the host's bridge ports to be ipv6 addressable to enable ipv6 to function in the guest, and most admins won't think to add special entries into their host's nftables.conf to allow for it 'because who knew?' it represents what you might call a 'passive security vulnerability'.

A recent kernel upgrade has broken ipv6/ip6 ndp in a host/kvm setup using a bridge on the host and vlans for some guests. I've tracked the problem to a failure of the mcast code to add entries to the host's mdb table. Manually adding the entries to the mdb on the bridge corrects the problem.

It's very easy to demonstrate the bug in an all ubuntu setup.

1. On an ubuntu host, create two vms, I used libvirt, as set up below.

2. On the host, create a bridge and vlan with two ports, each with the chosen vlan as PVID and egress untagged. Assign those ports one each to the guests as the interface, use e1000. Be sure to NOT autoconfigure the host side of the bridge ports with any ip4 or ip6 address (including fe80::), it's just an avoidable security risk. We don't want to allow the host any sort of ip access / exposure to the vlan. In other words, treat the host's bridge ports as if a 'real off-host switch' without expectation of making each bridge's port being ip6 addressable on the bridge itself. (FWIW: Worth checking if the vlan is left tagged and not pvid, and the vlan is decoded in the guest as a separate interface, does the problem go away? It imposes the burden of vlan management awareness to the guest and so is not acceptable as a solution.)

3. On the host, assign a physical NIC to the bridge and the vlan to the nic. The egress is tagged for the chosen vlan and not PVID. Optionally set up an off-host gateway for the vlan, but it isn't necessary to show the bug.

4. On each guest, manually assign a unique ip4 and ip6 address on the same subnet (you'll see though dhcp4 could work if there was an off-host router providing related services, the bug prevents dhcp6 from working).

5. On one vm, ping the other. Notice ip4 pings work, ip6 pings do not.

6. Manually add the fe02::ffxx:xxxx entries for each vm to the vlan to the host bridge's multicast table. Use 'temp' if you're quick enough, otherwise perm.

7. Notice pings between the guests now work on ip6 and ipv4.

Using tcpdump and watching icmp6 traffic, you'll notice the packets making it across the various bridge ports the moment you manually add the appropriate fe02::ff... multicast address to the mdb table. Beware a false sense of security: Once the ndp completes and the link addresses are in the fdb, it can 'seem like' everything is fine until the fdb times out and the required mdb entry again must be used to allow ndp to refresh the address.

Setting mcast_querier doesn't help. Perhaps previous kernels turned off the multicast snooping by default and just flooded all the bridge ports with all multicast traffic so this bug was avoided.

It's my hunch the reason there hasn't been more complaint about this is it takes an extra step to not autoconfigure the vm ports with fe80:: link local addresses on the host. I believe the existence of the fe80 address on the host ports engages ndp code on the host to load the mdb as if preparing for the host's side of the bridge to participate in ip4 and ip6 higher layer traffic, but that's a 'bad hack that happens to work' -- it shouldn't be a requirement that each host vlan port have an ip6 address, after all it didn't need an IP4 address....

I've attached a linux-bug for you, but it's probably mostly unrelated info.

I believe as the bug presently requires each of the host's bridge ports to be ipv6 addressable to enable ipv6 to function in the guest, and most admins won't think to add special entries into their host's nftables.conf to allow for it 'because who knew?' it represents what you might call a 'passive security vulnerability'.

Harry Coin (hcoin)
description: updated
Revision history for this message
Harry Coin (hcoin) wrote (last edit ):

Temporary workaround:

On the vmhost:

echo 0 > /sys/devices/virtual/net/<your bridge device name>/bridge/multicast_snooping

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Hello Harry, nice bug report, thanks; do you mind if we set this public? As you said, "passive", I suspect it'd be more useful for it to be visible to all.

Thanks

Revision history for this message
Harry Coin (hcoin) wrote : Re: [Bug 1959702] Re: Regression: ip6 ndp broken, host bridge doesn't add vlan guest entry to mdb

If you delete the attachment, then ok.  You could add the uname and so
on if you like.

On 2/1/22 17:09, Seth Arnold wrote:
> Hello Harry, nice bug report, thanks; do you mind if we set this public?
> As you said, "passive", I suspect it'd be more useful for it to be
> visible to all.
>
> Thanks
>

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Sounds good, thanks:

 [ 0.000000] Linux version 5.11.0-49-generic (buildd@lcy02-amd64-054) (gcc (Ubuntu 10.3.0-1ubuntu1) 10.3.0, GNU ld (GNU Binutils for Ubuntu) 2.36.1) #55-Ubuntu SMP Wed Jan 12 17:36:34 UTC 2022 (Ubuntu 5.11.0-49.55-generic 5.11.22)

btw, there were a bunch of memory allocation failures in the dmesg, in case they weren't obvious.

information type: Private Security → Public Security
Revision history for this message
Harry Coin (hcoin) wrote :

Yup, those failures were to do with an old radeon chipset on an ancient
server.

On 2/1/22 17:33, Seth Arnold wrote:
> Sounds good, thanks:
>
> [ 0.000000] Linux version 5.11.0-49-generic (buildd@lcy02-amd64-054)
> (gcc (Ubuntu 10.3.0-1ubuntu1) 10.3.0, GNU ld (GNU Binutils for Ubuntu)
> 2.36.1) #55-Ubuntu SMP Wed Jan 12 17:36:34 UTC 2022 (Ubuntu
> 5.11.0-49.55-generic 5.11.22)
>
> btw, there were a bunch of memory allocation failures in the dmesg, in
> case they weren't obvious.
>
> ** Attachment removed: "apport.linux-image-5.11.0-49-generic.rw5aqx7x.apport"
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1959702/+attachment/5558647/+files/apport.linux-image-5.11.0-49-generic.rw5aqx7x.apport
>
> ** Information type changed from Private Security to Public Security
>

Revision history for this message
Jay Vosburgh (jvosburgh) wrote :

Harry,

I am attempting to reproduce the behavior you describe, but have been
unable to do so. Could you clarify some of the configuration specifics,
as follows:

Starting with step 2,

"2. On the host, create a bridge and vlan with two ports, each with the
chosen vlan as PVID and egress untagged. Assign those ports one each to
the guests as the interface, use e1000. Be sure to NOT autoconfigure the
host side of the bridge ports with any ip4 or ip6 address (including
fe80::), [...]"

I have configured testbr0 with no addresses, i.e., "ip addr show":

15: testbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 8c:dc:d4:b3:cb:f1 brd ff:ff:ff:ff:ff:ff

and added vnet1 and vnet3 (the interfaces that connect to the VMs), to
testbr0, and removing their respective fe80: addresses. I set the bridge
vlan behavior via

bridge vlan add dev vnet1 vid 1234 pvid untagged
bridge vlan add dev vnet3 vid 1234 pvid untagged
bridge vlan del dev vnet1 vid 1
bridge vlan del dev vnet3 vid 1

then added a separate Ethernet device to the testbr0, removed its fe80:
address, and set its bridge vlan as

bridge vlan add dev eno50 vid 1234
bridge vlan del dev eno50 vid 1

Adding addresses to the interfaces within the VMs results in ping between
the two functioning (even in the face of "ip neigh flush dev enp7s0").

At no time does "bridge mdb" affect the behavior (it lists no entries),
and it in unnecessary to add ff02:ffxx entries as you describe in step 6
(I presume that your mention of "fe02::ff..." is a typo for "ff02").

I am testing with the Ubuntu 5.11.0-46 kernel, which differs slightly from
your 5.11.0-49. From which version did you upgrade (i.e., what was the
last known working version)? I'm preparing to test with 5.11.0-50 (I am
unable to locate -49), but would also like to know if the above
description matches your configuration.

Thanks.

Revision history for this message
Harry Coin (hcoin) wrote :
Download full text (3.6 KiB)

On 2/5/22 18:59, Jay Vosburgh wrote:
> Harry,
>
> I am attempting to reproduce the behavior you describe, but have been
> unable to do so. Could you clarify some of the configuration specifics,
> as follows:
>
> Starting with step 2,
>
> "2. On the host, create a bridge and vlan with two ports, each with the
> chosen vlan as PVID and egress untagged. Assign those ports one each to
> the guests as the interface, use e1000. Be sure to NOT autoconfigure the
> host side of the bridge ports with any ip4 or ip6 address (including
> fe80::), [...]"
>
> I have configured testbr0 with no addresses, i.e., "ip addr show":
>
> 15: testbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> link/ether 8c:dc:d4:b3:cb:f1 brd ff:ff:ff:ff:ff:ff
>
> and added vnet1 and vnet3 (the interfaces that connect to the VMs), to
> testbr0, and removing their respective fe80: addresses. I set the bridge
> vlan behavior via
>
> bridge vlan add dev vnet1 vid 1234 pvid untagged
> bridge vlan add dev vnet3 vid 1234 pvid untagged
> bridge vlan del dev vnet1 vid 1
> bridge vlan del dev vnet3 vid 1
>
> then added a separate Ethernet device to the testbr0, removed its fe80:
> address, and set its bridge vlan as
>
> bridge vlan add dev eno50 vid 1234
> bridge vlan del dev eno50 vid 1
>
> Adding addresses to the interfaces within the VMs results in ping between
> the two functioning (even in the face of "ip neigh flush dev enp7s0").
>
> At no time does "bridge mdb" affect the behavior (it lists no entries),
> and it in unnecessary to add ff02:ffxx entries as you describe in step 6
> (I presume that your mention of "fe02::ff..." is a typo for "ff02").
>
> I am testing with the Ubuntu 5.11.0-46 kernel, which differs slightly from
> your 5.11.0-49. From which version did you upgrade (i.e., what was the
> last known working version)? I'm preparing to test with 5.11.0-50 (I am
> unable to locate -49), but would also like to know if the above
> description matches your configuration.
>
> Thanks.
>

These are partial, showing the relevant bits.

root@noc2:~# uname -a
Linux noc2.1.quietfountain.com 5.11.0-49-generic #55-Ubuntu SMP Wed Jan
12 17:36:34 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
root@noc2:~# bridge vlan show
port              vlan-id
enp4s0f1          100 PVID Egress Untagged
                   101
                   102
                   104
gate              100
                   101
                   102
                   103
                   104
                   105
...

dbs               102 PVID Egress Untagged
                   121
dbl               101 PVID Egress Untagged
                   121

4: enp4s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
master vmbridge state UP group default qlen 1000
    link/ether 00:15:17:5a:ea:a9 brd ff:ff:ff:ff:ff:ff
...

13: gate: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
master vmbridge state UNKNOWN group default qlen 1000
    link/ether fe:54:00:9d:61:fa brd ff:ff:ff:ff:ff:ff

15: dbs: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master
vmbridge state UNKNOWN group default qlen 1000
    link/ether fe:54:00:2a:0e:ac brd ff:ff...

Read more...

Revision history for this message
Harry Coin (hcoin) wrote :

On 2/5/22 18:59, Jay Vosburgh wrote:
> Harry,
>
> I am attempting to reproduce the behavior you describe, but have been
> unable to do so. Could you clarify some of the configuration specifics,
> as follows:
>
> Starting with step 2,
>
> "2. On the host, create a bridge and vlan with two ports, each with the
> chosen vlan as PVID and egress untagged. Assign those ports one each to
> the guests as the interface, use e1000. Be sure to NOT autoconfigure the
> host side of the bridge ports with any ip4 or ip6 address (including
> fe80::), [...]"
>
> I have configured testbr0 with no addresses, i.e., "ip addr show":
>
> 15: testbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> link/ether 8c:dc:d4:b3:cb:f1 brd ff:ff:ff:ff:ff:ff
>
> and added vnet1 and vnet3 (the interfaces that connect to the VMs), to
> testbr0, and removing their respective fe80: addresses. I set the bridge
> vlan behavior via
>
> bridge vlan add dev vnet1 vid 1234 pvid untagged
> bridge vlan add dev vnet3 vid 1234 pvid untagged
> bridge vlan del dev vnet1 vid 1
> bridge vlan del dev vnet3 vid 1
>
> then added a separate Ethernet device to the testbr0, removed its fe80:
> address, and set its bridge vlan as
>
> bridge vlan add dev eno50 vid 1234
> bridge vlan del dev eno50 vid 1
>
> Adding addresses to the interfaces within the VMs results in ping between
> the two functioning (even in the face of "ip neigh flush dev enp7s0").
>
> At no time does "bridge mdb" affect the behavior (it lists no entries),
> and it in unnecessary to add ff02:ffxx entries as you describe in step 6
> (I presume that your mention of "fe02::ff..." is a typo for "ff02").
>
> I am testing with the Ubuntu 5.11.0-46 kernel, which differs slightly from
> your 5.11.0-49. From which version did you upgrade (i.e., what was the
> last known working version)? I'm preparing to test with 5.11.0-50 (I am
> unable to locate -49), but would also like to know if the above
> description matches your configuration.
>
> Thanks.

Also, I'd say if any of the interfaces *ever* had  v6 addresses, then
there would be entries in the fdb and mdb.

But on these systems we have:

net.ipv6.conf.all.autoconf = 0

before the interfaces are created.

>

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1959702

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Harry Coin (hcoin)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Harry Coin (hcoin) wrote :

I need to repeat: in sysctl.d put this line in a file, then reboot, then your test setup will show the failure:

net.ipv6.conf.all.autoconf = 0

Otherwise, in your test setup the tables are populated, then you delete the addresses, but the L3/4 code engaged by even a little time with the fe80:... address at least until there is a timeout in the tables (longer than you wait for your test) kills the platform.

Putting the above in then rebooting will avoid that, so you will see the bug when at no time was there an fe80 address on the interface.

Revision history for this message
Harry Coin (hcoin) wrote :

P.S. The reason this is a security issue is-- there is now an address on the host that the guest also 'knows' and it sits on the bridge giving access to all the other guests on the bridge. Most admins will not 'just know' they need rules to block fe80 traffic generated by host interfaces-- because of course at least one host interface needs an fe80 address for ipv6 neighbor protocols to work. It's a really big 'soft hole'.

Revision history for this message
Jay Vosburgh (jvosburgh) wrote :

Harry,

 I'm still working to reproduce this, without success. I have set
the .autoconf sysctl to 0 (which controls creation of local addresses in
response to received Router Advertisements), as well as setting
.addr_gen_mode to 1 (to disable SLAAC (fe80::) addresses).

 In any event, .autoconf=0 and .addr_gen_mode=1 still fails to
reproduce the issue on my test system.

 I find that if I disable mcast_flood on the relevant bridge ports
(i.e., bridge link set dev vnet1 mcast_flood off) I do see the behavior
you describe, but in that case no variant that I've tried (no vid, and all
vids in use) of "bridge mdb add ... grp ff02::1:ff00:2" appears to permit
ND traffic to pass to the VM destination.

 Can you provide more specifics of how exactly the bridge and ports
are configured? Ideally, both the method to set it up, as well as the
configuration details when failing (i.e., "ip -s -d link show" for the
bridge and relevant bridge ports, "bridge vlan show", "bridge mdb show",
"bridge fdb show br [bridgename]")

 Also, to answer a question from your original report, the default
setting in the kernel for multicast_snooping (enabled, i.e., 1) hasn't
changed recently (and quite possibly ever).

Revision history for this message
Harry Coin (hcoin) wrote :
Download full text (4.0 KiB)

It looks to be 'an interesting mystery' we're chasing. This system is in production, so the results below are with the whole 'snooping engine' off as without it the whole thing dies. As such, I don't think the contents of the fdb and mdb tables mean much. The setups below are unchanged, they fail if snooping is on, work when it's off.

root@noc1:~# ip -s -d link show vmbridge
6: vmbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:33:bf:a8:78 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 21 vlan_filtering 1 vlan_protocol 802.1Q bridge_id 0015.52:54:33:bf:a8:78 designated_root 0015.52:54:33:bf:a8:78 root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer 0.00 tcn_timer 0.00 topology_change_timer 0.00 gc_timer 137.32 vlan_default_pvid 0 vlan_stats_enabled 0 vlan_stats_per_port 0 group_fwd_mask 0 group_address 01:80:c2:00:00:00 mcast_snooping 0 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 16 mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000 mcast_startup_query_interval 3124 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 32000 gso_max_segs 24
    RX: bytes packets errors dropped missed mcast
    2013504987 17838769 0 0 0 5037107
    TX: bytes packets errors dropped carrier collsns
    846 11 0 0 0 0

The bridge itself has no address of any kind other than a mac.

Here's a detail from one of many identical vm setups:

ip -s -d link show dbl
16: dbl: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbridge state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether fe:54:00:60:c5:db brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68 maxmtu 65521
    tun type tap pi off vnet_hdr off persist off
    bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on port_id 0x800a port_no 0xa designated_port 32778 designated_cost 0 designated_bridge 0015.52:54:33:bf:a8:78 designated_root 0015.52:54:33:bf:a8:78 hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on mcast_to_unicast off neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0 vlan_tunnel off isolated off addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    RX: bytes packets errors dropped missed mcast
    18895018565 174327707 0 25070 0 0
    TX: bytes packets errors dropped carrier collsns
    30861506740 198028258 0 1427 0 0

There's some confidential...

Read more...

To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.