Wrong nexthop selection with two default routers where only one is REACHABLE

Bug #2071397 reported by Matt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
New
Undecided
Unassigned

Bug Description

Hi,

NOTE: I was unsure how to report this bug as I found suggestions that I should report it against the distro and another was to use the mailing list. (Also see: https://marc.info/?l=linux-netdev&m=171953240705042&w=2)

This appears to be a bug in Linux kernel networking. This was observed on a fresh install of Ubuntu 24.04, with Linux 6.8.0-36-generic.

PROBLEM
In the network diagram below, I have two default routers (TR1 and TR2). The HUT has two neighbor cache entries: TR1=REACHABLE and TR2=INCOMPLETE. When I ping the host (HUT) from a remote test node (TN2) via TR1, the HUT sends a NS for TR2 when it should have replied directly via TR1. This breaks communication and violates IPv6 Logo compliance.

            TN2
             |
    +--------+--------+
    | |
   TR1 TR2
(REACHABLE) (INCOMPLETE)
    | |
    +--------+--------+
             |
            HUT

The RFC for Neighbor Discovery describes the policy for selecting routes from the Default Router List. The relevant bullet is extracted below…

RFC4861 6.3.6. Default Router Selection
 The policy for selecting routers from the Default Router List is as
 follows:

 1) Routers that are reachable or probably reachable (i.e., in any
    state other than INCOMPLETE) SHOULD be preferred over routers
    whose reachability is unknown or suspect (i.e., in the
    INCOMPLETE state, or for which no Neighbor Cache entry exists).
    Further implementation hints on default router selection when
    multiple equivalent routers are available are discussed in
    [[LD-SHRE](https://datatracker.ietf.org/doc/html/rfc4861#ref-LD-SHRE)].

REPRODUCER
This condition is created by configuring two routers under systemd-networkd, either by having each router send an RA, or statically configuring one router at a time. I show the steps for the static configuration below.

Assuming you have an interface named “enp0s9” and you’re using systemd-networkd as the network manager:

1. Configure the Host (HUT) with one router (TR1)
$ networkctl cat 10-enp0s9.network
# /etc/systemd/network/10-enp0s9.network
[Match]
Name=enp0s9

[Link]
RequiredForOnline=no

[Network]
Description="Internal Network: Private VM-to-VM IPv6 interface"
DHCP=no
LLDP=no
EmitLLDP=no

# /etc/systemd/network/10-enp0s9.network.d/address.conf
[Network]
Address=2001:2:0:1000:a00:27ff:fe5f:f72d/64

# /etc/systemd/network/10-enp0s9.network.d/route-1060.conf
[Route]
Gateway=fe80::200:10ff:fe10:1060
GatewayOnLink=true

2. Start or reload the configuration
$ sudo networkctl reload
$ sudo networkctl reconfigure enp0s9
$ ip -6 r
2001:2:0:1000::/64 dev enp0s9 proto kernel metric 256 pref medium
fe80::/64 dev enp0s3 proto kernel metric 256 pref medium
fe80::/64 dev enp0s9 proto kernel metric 256 pref medium
default via fe80::200:10ff:fe10:1060 dev enp0s9 proto static metric 1024 onlink pref medium

3. Flush and Monitor the neighbor cache
$ sudo ip -6 neigh flush all; ip -6 -ts monitor neigh

4. From TN1, ping HUT via TR1 – the HUT’s NCE is updated to REACHABLE
[2024-06-28T08:13:27.617674] fe80::200:10ff:fe10:1060 dev enp0s9 lladdr 00:00:10:10:10:60 router REACHABLE

NOTE: tcpdump shows the expected protocol exchange.

5. Configure the Host (HUT) with a 2nd router (TR2)
$ cat /etc/systemd/network/10-enp0s9.network.d/route-1061.conf
[Route]
Gateway=fe80::200:10ff:fe10:1061
GatewayOnLink=true
$ sudo networkctl reload
$ sudo networkctl reconfigure enp0s9
$ ip -6 r
2001:2:0:1000::/64 dev enp0s9 proto kernel metric 256 pref medium
fe80::/64 dev enp0s3 proto kernel metric 256 pref medium
fe80::/64 dev enp0s9 proto kernel metric 256 pref medium
default proto static metric 1024 pref medium
     nexthop via fe80::200:10ff:fe10:1061 dev enp0s9 weight 1
     nexthop via fe80::200:10ff:fe10:1060 dev enp0s9 weight 1

6. Start monitoring traffic with tcpdump/WireShark

7. From TN1, ping HUT via TR1
a. An echo reply is never received
b. The protocol exchange shows the HUT sends a NS for TR2 (which is NOT REACHABLE) when it should have sent an echo-reply via TR1 (which is REACHABLE).

OBSERVATIONS
1. When NOT using systemd-network and each router sends an RA, the kernel behaves correctly.
2. The routing table looks different, depending on whether the kernel adds the route or systemd-networkd adds the route. E.g.
a. Kernel adds two separate “default route” entries (systemd-networkd is stopped)
$ ip -6 route
<deleted lines>
default via fe80::200:10ff:fe10:1060 proto ra metric 1024 expires 39sec hoplimit 64 pref medium
default via fe80::200:10ff:fe10:1061 proto ra metric 1024 expires 44sec hoplimit 64 pref medium
b. Systemd-networkd adds one “default route” with two nexthop options (systemd-networkd is running)
$ ip -6 route
<deleted lines>
default proto ra metric 1024 expires 589sec pref medium
 nexthop via fe80::200:10ff:fe10:1060 dev enp0s9 weight 1
 nexthop via fe80::200:10ff:fe10:1061 dev enp0s9 weight 1
TCPDUMP
For completeness, here is the annotated output from tcpdump…

$ tcpdump -r ~/v6LC_2_2_11-bug-report-summary.pcapng -t -n --number -e
reading from file /home/matt/v6LC_2_2_11-bug-report-summary.pcapng, link-type EN10MB (Ethernet), snapshot length 262144

    # Step 4: TN1(1181) pings HUT(f72d) via TR1(1060)
    1 00:00:10:10:10:60 > 08:00:27:5f:f7:2d, ethertype IPv6 (0x86dd), length 70: 2001:2:0:1001:200:10ff:fe10:1181 > 2001:2:0:1000:a00:27ff:fe5f:f72d: ICMP6, echo request, id 0, seq 0, length 16
    2 08:00:27:5f:f7:2d > 33:33:ff:10:10:60, ethertype IPv6 (0x86dd), length 86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1060: ICMP6, neighbor solicitation, who has fe80::200:10ff:fe10:1060, length 32
    3 00:00:10:10:10:60 > 08:00:27:5f:f7:2d, ethertype IPv6 (0x86dd), length 86: fe80::200:10ff:fe10:1060 > fe80::a00:27ff:fe5f:f72d: ICMP6, neighbor advertisement, tgt is fe80::200:10ff:fe10:1060, length 32
    4 08:00:27:5f:f7:2d > 00:00:10:10:10:60, ethertype IPv6 (0x86dd), length 70: 2001:2:0:1000:a00:27ff:fe5f:f72d > 2001:2:0:1001:200:10ff:fe10:1181: ICMP6, echo reply, id 0, seq 0, length 16

    # HUT has replied to TN1 via TR1. NCE for TR1=REACHABLE

    # Step 5: Now configure TR2
    # Step 7: TN1(1181) pings HUT(f72d) via TR1(1060)
    5 00:00:10:10:10:60 > 08:00:27:5f:f7:2d, ethertype IPv6 (0x86dd), length 70: 2001:2:0:1001:200:10ff:fe10:1181 > 2001:2:0:1000:a00:27ff:fe5f:f72d: ICMP6, echo request, id 0, seq 0, length 16

    # HUT creates an NCE for TR2=INCOMPLETE

    # HUT incorrectly sends NS for TR2(1061) when it should have sent echo-reply via TR1(1060)
    6 08:00:27:5f:f7:2d > 33:33:ff:10:10:61, ethertype IPv6 (0x86dd), length 86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1061: ICMP6, neighbor solicitation, who has fe80::200:10ff:fe10:1061, length 32
    7 08:00:27:5f:f7:2d > 33:33:ff:10:10:61, ethertype IPv6 (0x86dd), length 86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1061: ICMP6, neighbor solicitation, who has fe80::200:10ff:fe10:1061, length 32
    8 08:00:27:5f:f7:2d > 33:33:ff:10:10:61, ethertype IPv6 (0x86dd), length 86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1061: ICMP6, neighbor solicitation, who has fe80::200:10ff:fe10:1061, length 32

Regards,
Matt.

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: linux-image-6.8.0-36-generic 6.8.0-36.36
ProcVersionSignature: Ubuntu 6.8.0-36.36-generic 6.8.4
Uname: Linux 6.8.0-36-generic x86_64
ApportVersion: 2.28.1-0ubuntu3
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/seq: matt 2599 F.... pipewire
 /dev/snd/controlC0: matt 2603 F.... wireplumber
CRDA: N/A
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
Date: Fri Jun 28 10:52:11 2024
InstallationDate: Installed on 2024-06-24 (4 days ago)
InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424)
Lsusb:
 Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 002: ID 80ee:0021 VirtualBox USB Tablet
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Lsusb-t:
 /: Bus 001.Port 001: Dev 001, Class=root_hub, Driver=ohci-pci/12p, 12M
     |__ Port 001: Dev 002, If 0, Class=Human Interface Device, Driver=usbhid, 12M
 /: Bus 002.Port 001: Dev 001, Class=root_hub, Driver=ehci-pci/12p, 480M
MachineType: innotek GmbH VirtualBox
ProcEnviron:
 LANG=en_US.UTF-8
 PATH=(custom, no user)
 SHELL=/bin/bash
 TERM=xterm-256color
 XDG_RUNTIME_DIR=<set>
ProcFB: 0 vmwgfxdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.8.0-36-generic root=UUID=d3096757-b767-4cf4-8b9c-c65a87bd4f4e ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-6.8.0-36-generic N/A
 linux-backports-modules-6.8.0-36-generic N/A
 linux-firmware 20240318.git3b128b60-0ubuntu2.1
RfKill:

SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/01/2006
dmi.bios.vendor: innotek GmbH
dmi.bios.version: VirtualBox
dmi.board.name: VirtualBox
dmi.board.vendor: Oracle Corporation
dmi.board.version: 1.2
dmi.chassis.type: 1
dmi.chassis.vendor: Oracle Corporation
dmi.modalias: dmi:bvninnotekGmbH:bvrVirtualBox:bd12/01/2006:svninnotekGmbH:pnVirtualBox:pvr1.2:rvnOracleCorporation:rnVirtualBox:rvr1.2:cvnOracleCorporation:ct1:cvr:sku:
dmi.product.family: Virtual Machine
dmi.product.name: VirtualBox
dmi.product.version: 1.2
dmi.sys.vendor: innotek GmbH

Revision history for this message
Matt (livefreeandroam) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.