Wrong nexthop selection with two default routers where only one is REACHABLE
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Hi,
NOTE: I was unsure how to report this bug as I found suggestions that I should report it against the distro and another was to use the mailing list. (Also see: https:/
This appears to be a bug in Linux kernel networking. This was observed on a fresh install of Ubuntu 24.04, with Linux 6.8.0-36-generic.
PROBLEM
In the network diagram below, I have two default routers (TR1 and TR2). The HUT has two neighbor cache entries: TR1=REACHABLE and TR2=INCOMPLETE. When I ping the host (HUT) from a remote test node (TN2) via TR1, the HUT sends a NS for TR2 when it should have replied directly via TR1. This breaks communication and violates IPv6 Logo compliance.
TN2
|
+--
| |
TR1 TR2
(REACHABLE) (INCOMPLETE)
| |
+--
|
HUT
The RFC for Neighbor Discovery describes the policy for selecting routes from the Default Router List. The relevant bullet is extracted below…
RFC4861 6.3.6. Default Router Selection
The policy for selecting routers from the Default Router List is as
follows:
1) Routers that are reachable or probably reachable (i.e., in any
state other than INCOMPLETE) SHOULD be preferred over routers
whose reachability is unknown or suspect (i.e., in the
INCOMPLETE state, or for which no Neighbor Cache entry exists).
Further implementation hints on default router selection when
multiple equivalent routers are available are discussed in
[[LD-SHRE](https:/
REPRODUCER
This condition is created by configuring two routers under systemd-networkd, either by having each router send an RA, or statically configuring one router at a time. I show the steps for the static configuration below.
Assuming you have an interface named “enp0s9” and you’re using systemd-networkd as the network manager:
1. Configure the Host (HUT) with one router (TR1)
$ networkctl cat 10-enp0s9.network
# /etc/systemd/
[Match]
Name=enp0s9
[Link]
RequiredForOnli
[Network]
Description=
DHCP=no
LLDP=no
EmitLLDP=no
# /etc/systemd/
[Network]
Address=
# /etc/systemd/
[Route]
Gateway=
GatewayOnLink=true
2. Start or reload the configuration
$ sudo networkctl reload
$ sudo networkctl reconfigure enp0s9
$ ip -6 r
2001:2:0:1000::/64 dev enp0s9 proto kernel metric 256 pref medium
fe80::/64 dev enp0s3 proto kernel metric 256 pref medium
fe80::/64 dev enp0s9 proto kernel metric 256 pref medium
default via fe80::200:
3. Flush and Monitor the neighbor cache
$ sudo ip -6 neigh flush all; ip -6 -ts monitor neigh
4. From TN1, ping HUT via TR1 – the HUT’s NCE is updated to REACHABLE
[2024-06-
NOTE: tcpdump shows the expected protocol exchange.
5. Configure the Host (HUT) with a 2nd router (TR2)
$ cat /etc/systemd/
[Route]
Gateway=
GatewayOnLink=true
$ sudo networkctl reload
$ sudo networkctl reconfigure enp0s9
$ ip -6 r
2001:2:0:1000::/64 dev enp0s9 proto kernel metric 256 pref medium
fe80::/64 dev enp0s3 proto kernel metric 256 pref medium
fe80::/64 dev enp0s9 proto kernel metric 256 pref medium
default proto static metric 1024 pref medium
nexthop via fe80::200:
nexthop via fe80::200:
6. Start monitoring traffic with tcpdump/WireShark
7. From TN1, ping HUT via TR1
a. An echo reply is never received
b. The protocol exchange shows the HUT sends a NS for TR2 (which is NOT REACHABLE) when it should have sent an echo-reply via TR1 (which is REACHABLE).
OBSERVATIONS
1. When NOT using systemd-network and each router sends an RA, the kernel behaves correctly.
2. The routing table looks different, depending on whether the kernel adds the route or systemd-networkd adds the route. E.g.
a. Kernel adds two separate “default route” entries (systemd-networkd is stopped)
$ ip -6 route
<deleted lines>
default via fe80::200:
default via fe80::200:
b. Systemd-networkd adds one “default route” with two nexthop options (systemd-networkd is running)
$ ip -6 route
<deleted lines>
default proto ra metric 1024 expires 589sec pref medium
nexthop via fe80::200:
nexthop via fe80::200:
TCPDUMP
For completeness, here is the annotated output from tcpdump…
$ tcpdump -r ~/v6LC_
reading from file /home/matt/
# Step 4: TN1(1181) pings HUT(f72d) via TR1(1060)
1 00:00:10:10:10:60 > 08:00:27:5f:f7:2d, ethertype IPv6 (0x86dd), length 70: 2001:2:
2 08:00:27:5f:f7:2d > 33:33:ff:10:10:60, ethertype IPv6 (0x86dd), length 86: 2001:2:
3 00:00:10:10:10:60 > 08:00:27:5f:f7:2d, ethertype IPv6 (0x86dd), length 86: fe80::200:
4 08:00:27:5f:f7:2d > 00:00:10:10:10:60, ethertype IPv6 (0x86dd), length 70: 2001:2:
# HUT has replied to TN1 via TR1. NCE for TR1=REACHABLE
# Step 5: Now configure TR2
# Step 7: TN1(1181) pings HUT(f72d) via TR1(1060)
5 00:00:10:10:10:60 > 08:00:27:5f:f7:2d, ethertype IPv6 (0x86dd), length 70: 2001:2:
# HUT creates an NCE for TR2=INCOMPLETE
# HUT incorrectly sends NS for TR2(1061) when it should have sent echo-reply via TR1(1060)
6 08:00:27:5f:f7:2d > 33:33:ff:10:10:61, ethertype IPv6 (0x86dd), length 86: 2001:2:
7 08:00:27:5f:f7:2d > 33:33:ff:10:10:61, ethertype IPv6 (0x86dd), length 86: 2001:2:
8 08:00:27:5f:f7:2d > 33:33:ff:10:10:61, ethertype IPv6 (0x86dd), length 86: 2001:2:
Regards,
Matt.
ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: linux-image-
ProcVersionSign
Uname: Linux 6.8.0-36-generic x86_64
ApportVersion: 2.28.1-0ubuntu3
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/seq: matt 2599 F.... pipewire
/dev/snd/
CRDA: N/A
CasperMD5CheckR
CurrentDesktop: ubuntu:GNOME
Date: Fri Jun 28 10:52:11 2024
InstallationDate: Installed on 2024-06-24 (4 days ago)
InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424)
Lsusb:
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 002: ID 80ee:0021 VirtualBox USB Tablet
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Lsusb-t:
/: Bus 001.Port 001: Dev 001, Class=root_hub, Driver=
|__ Port 001: Dev 002, If 0, Class=Human Interface Device, Driver=usbhid, 12M
/: Bus 002.Port 001: Dev 001, Class=root_hub, Driver=
MachineType: innotek GmbH VirtualBox
ProcEnviron:
LANG=en_US.UTF-8
PATH=(custom, no user)
SHELL=/bin/bash
TERM=xterm-
XDG_RUNTIME_
ProcFB: 0 vmwgfxdrmfb
ProcKernelCmdLine: BOOT_IMAGE=
RelatedPackageV
linux-
linux-
linux-firmware 20240318.
RfKill:
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/01/2006
dmi.bios.vendor: innotek GmbH
dmi.bios.version: VirtualBox
dmi.board.name: VirtualBox
dmi.board.vendor: Oracle Corporation
dmi.board.version: 1.2
dmi.chassis.type: 1
dmi.chassis.vendor: Oracle Corporation
dmi.modalias: dmi:bvninnotekG
dmi.product.family: Virtual Machine
dmi.product.name: VirtualBox
dmi.product.
dmi.sys.vendor: innotek GmbH