systemd-networkd doesn't process IPv6 RA properly

Bug #1800836 reported by Simon Déziel
50
This bug affects 9 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Won't Fix
Undecided
Unassigned

Bug Description

The gateways/firewalls in our DC are highly available and when there is a failover their IPv6 VIP (fe80::1) moves from the master to the backup one.

We found that only our Bionic VMs behind those gateways had issues after a failover. Those Bionic VMs were all running systemd-networkd (from netplan) and before the failover they had:

$ ip -6 route
...
default via fe80::1 dev eth0 proto ra metric 1024 pref medium

But after a failover:

$ ip -6 route
...
default proto ra metric 1024
        nexthop via fe80::1 dev eth0 weight 1
        nexthop via fe80::210:18ff:febe:6750 dev eth0 weight 1

And after another failover:

$ ip -6 route
...
default proto ra metric 1024
        nexthop via fe80::1 dev eth0 weight 1
        nexthop via fe80::210:18ff:febe:6750 dev eth0 weight 1
        nexthop via fe80::210:18ff:fe77:b558 dev eth0 weight 1

This is problematic as those then use fe80::210:18ff:fe77:b558%$IFACE as their default gateway even when this gateway is unavailable:

$ ip -6 route get ::
:: from :: via fe80::210:18ff:fe77:b558 dev eth0 proto ra src fe80::a800:ff:fe51:8c37 metric 1024 pref medium

We concluded it was a systemd-networkd bug after checking that the following combinations were NOT affected:

1) Xenial+4.4 kernel
2) Xenial+4.15 kernel
3) Bionic+ifupdown

Additional information:

$ apt-cache policy systemd
systemd:
  Installed: 237-3ubuntu10.3
  Candidate: 237-3ubuntu10.3
  Version table:
 *** 237-3ubuntu10.3 500
        500 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     237-3ubuntu10 500
        500 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages

$ lsb_release -rd
Description: Ubuntu 18.04.1 LTS
Release: 18.04

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: systemd 237-3ubuntu10.3
ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
Uname: Linux 4.15.0-38-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.4
Architecture: amd64
Date: Wed Oct 31 08:47:28 2018
Lspci: Error: [Errno 2] No such file or directory: 'lspci': 'lspci'
Lsusb: Error: [Errno 2] No such file or directory: 'lsusb': 'lsusb'
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic root=UUID=43b7ee2e-2ab1-4505-8e0b-d9fe0563a034 ro console=ttyS0 net.ifnames=0 vsyscall=none kaslr nmi_watchdog=0 possible_cpus=1
SourcePackage: systemd
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: Ubuntu-1.8.2-1ubuntu1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-xenial
dmi.modalias: dmi:bvnSeaBIOS:bvrUbuntu-1.8.2-1ubuntu1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-xenial:cvnQEMU:ct1:cvrpc-i440fx-xenial:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-xenial
dmi.sys.vendor: QEMU

Revision history for this message
Simon Déziel (sdeziel) wrote :
Revision history for this message
Simon Déziel (sdeziel) wrote :

systemd from Cosmic is not affected by this bug:

# apt-cache policy systemd
systemd:
  Installed: 239-7ubuntu10.3
  Candidate: 239-7ubuntu10.3
  Version table:
 *** 239-7ubuntu10.3 500
        500 http://archive.ubuntu.com/ubuntu cosmic-updates/main amd64 Packages
        500 http://archive.ubuntu.com/ubuntu cosmic-security/main amd64 Packages
        100 /var/lib/dpkg/status
     239-7ubuntu10 500
        500 http://archive.ubuntu.com/ubuntu cosmic/main amd64 Packages

summary: - systemd-networkd doesn't IPv6 RA properly
+ systemd-networkd doesn't process IPv6 RA properly
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Revision history for this message
Johan Wassberg (theseal) wrote :

I can confirm the same issues with `systemd-networkd` in Xenial (and Debian Stretch). Do you know if there is a patch from a modern `systemd` that can be used to backport a fix to older versions?

When not using `systemd-networkd` (like older dists or "vanilla" Xenial) it is possible to see when an default route expires, but that data seems to be lost with `systemd-networkd`.

Trusty without `systemd-networkd`:
```
$ ip -6 r
[...]
default via fe80::209:fff:fe09:5 dev eth0 proto kernel metric 1024 expires 1566sec
```

Xenial without `systemd-networkd`:
```
$ ip -6 r
[...]
default via fe80::209:fff:fe09:5 dev ens192 proto ra metric 1024 expires 1545sec pref medium
```

Xenial WITH `systemd-networkd`:
```
$ ip -6 r
[...]
default via fe80::209:fff:fe09:5 dev ens192 proto ra metric 1024 pref medium
```

Revision history for this message
Anton (user1553) wrote :

We are also affected by this issue. networkd does not expire routes which causes issues with active-passive router configurations.

For me it looks like the issue is fixed in this pull request: https://github.com/systemd/systemd/pull/3242

Revision history for this message
Dan Streetman (ddstreet) wrote :

> For me it looks like the issue is fixed in this pull request:
> https://github.com/systemd/systemd/pull/3242

that's included in Bionic already.

Revision history for this message
Dan Streetman (ddstreet) wrote :

@sdeziel can you provide more specific simplified steps to reproduce this? Like, specific networkd (or netplan) config file(s), and steps to reproduce?

Changed in systemd (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Simon Déziel (sdeziel) wrote :

fw01/02 have bond0.21 that is setup to have fe80::1 as the VIP used as the network gateway:

root@fw01:~# ip -6 a show bond0.21
8: bond0.21@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 2620:a:b:21::2/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::210:18ff:fe77:b558/64 scope link
       valid_lft forever preferred_lft forever

# fw02 is currently primary/master
root@fw02:~# ip -6 a show bond0.21
8: bond0.21@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 2620:a:b:21::3/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::1/64 scope link nodad
       valid_lft forever preferred_lft forever
    inet6 fe80::210:18ff:febe:6750/64 scope link
       valid_lft forever preferred_lft forever

fw01/02 /etc/radvd.conf looks like this:

interface bond0.21
{
  AdvSendAdvert on;
  MaxRtrAdvInterval 30;

  prefix 2620:a:b:21::/64
  {
  };
};

and radvd only runs on the primary/master fw (02 ATM).

After a failover, Bionic clients using netplan/systemd-networkd will have a bogus default nexthop like this:

$ ip -6 ro
2620:a:b:21::/64 dev eth0 proto ra metric 1024 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default proto ra metric 1024
 nexthop via fe80::1 dev eth0 weight 1
 nexthop via fe80::210:18ff:fe77:b558 dev eth0 weight 1
 nexthop via fe80::210:18ff:febe:6750 dev eth0 weight 1

Preventing them from communicating properly. To fix this, one has to manually do this:

sudo ip -6 ro del default proto ra && sudo netplan apply

Which then give the expected route entries like:

$ ip -6 ro
2620:a:b:21::/64 dev eth0 proto ra metric 1024 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default via fe80::1 dev eth0 proto ra metric 1024 pref medium

On the other hand, machines using ifupdown (Xenial or Bionic) in the same network segment have no problem keeping only fe80::1 as the default nexthop.

Changed in systemd (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Dan Streetman (ddstreet) wrote :

> After a failover

can you explain what exactly happens during the failover? What specific commands are executed on the old system to remove the VIP and/or adjust anything else?

Changed in systemd (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for systemd (Ubuntu) because there has been no activity for 60 days.]

Changed in systemd (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Kjetil Torgrim Homme (kjetilho) wrote :
Download full text (3.2 KiB)

This is still happening with 237-3ubuntu10.42 in Bionic. RA routes announced with lifetime of 0s are not removed. It is the firewall which used to be the default router which (continously) announce these 0s lifetimes to make sure there are no issues with failover.

$ ip -6 route show default
default proto ra metric 100
 nexthop via fe80::21b:21ff:febc:97e6 dev eth0 weight 1
 nexthop via fe80::21b:21ff:febc:9416 dev eth0 weight 1

$ sudo rdisc6 eth0
Soliciting ff02::2 (ff02::2) on eth0...

Hop limit : 64 ( 0x40)
Stateful address conf. : No
Stateful other conf. : No
Mobile home agent : No
Router preference : medium
Neighbor discovery proxy : No
Router lifetime : 0 (0x00000000) seconds
Reachable time : unspecified (0x00000000)
Retransmit time : unspecified (0x00000000)
 Recursive DNS server : 2a02:c0::1
  DNS server lifetime : 2419200 (0x0024ea00) seconds
 from fe80::21b:21ff:febc:9416

Hop limit : 64 ( 0x40)
Stateful address conf. : No
Stateful other conf. : No
Mobile home agent : No
Router preference : medium
Neighbor discovery proxy : No
Router lifetime : 0 (0x00000000) seconds
Reachable time : unspecified (0x00000000)
Retransmit time : unspecified (0x00000000)
 Recursive DNS server : 2a02:c0::1
  DNS server lifetime : 2419200 (0x0024ea00) seconds
 from fe80::21b:21ff:febc:9416

Hop limit : 64 ( 0x40)
Stateful address conf. : No
Stateful other conf. : No
Mobile home agent : No
Router preference : medium
Neighbor discovery proxy : No
Router lifetime : 30 (0x0000001e) seconds
Reachable time : unspecified (0x00000000)
Retransmit time : unspecified (0x00000000)
 Prefix : 2a02:c0:900:113::/64
  On-link : Yes
  Autonomous address conf.: Yes
  Valid time : 86400 (0x00015180) seconds
  Pref. time : 14400 (0x00003840) seconds
 Recursive DNS server : 2a02:c0::1
  DNS server lifetime : 2419200 (0x0024ea00) seconds
 from fe80::21b:21ff:febc:97e6

Hop limit : 64 ( 0x40)
Stateful address conf. : No
Stateful other conf. : No
Mobile home agent : No
Router preference : medium
Neighbor discovery proxy : No
Router lifetime : 30 (0x0000001e) seconds
Reachable time : unspecified (0x00000000)
Retransmit time : unspecified (0x00000000)
 Prefix : 2a02:c0:900:113::/64
  On-link : Yes
  Autonomous address conf.: Yes
  Valid time : 86400 (0x00015180) seconds
  Pref. time : 14400 (0x00003840) seconds
 Recursive DNS server : 2a02:c0::1
  DNS server lifetime : 2419...

Read more...

Changed in systemd (Ubuntu):
status: Expired → Confirmed
Revision history for this message
Dan Streetman (ddstreet) wrote :
Revision history for this message
Dan Streetman (ddstreet) wrote :

please reopen if this is still an issue

Changed in systemd (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Tore Anderson (toreanderson) wrote :

This bug is still present on Ubuntu 18.04.5 LTS with systemd 237-3ubuntu10.50. Please reopen.

Dan Streetman (ddstreet)
Changed in systemd (Ubuntu):
status: Invalid → New
Revision history for this message
Nick Rosbrook (enr0n) wrote :

Bionic is out of standard support, so any fix for this would need to go through ESM.

Changed in systemd (Ubuntu Bionic):
status: New → Won't Fix
Changed in systemd (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.