Regression on Jammy's kernel 5.15 when creating ip6gre and vti6 tunnels

Bug #2037667 reported by Danilo Egea Gondolfo
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned
Jammy
Triaged
High
Thadeu Lima de Souza Cascardo
systemd (Ubuntu)
Confirmed
Undecided
Unassigned
Jammy
Confirmed
High
Nick Rosbrook

Bug Description

We noticed that some of Netplan's integration tests started to fail on Jammy. These tests will try to create ip6gre and vti6 virtual interfaces and systemd-networkd is failing to create them starting on kernel 5.15.0-83.92. As far as I can tell, kernel 5.15.0-82.91 is the last revision where it works. So, some change between 5.15.0-82.91 and 5.15.0-83.92 is causing this regression.

How to reproduce the issue:

# Launch a jammy cloud VM:

lxc launch images:ubuntu/jammy/cloud jammy --vm
lxc shell jammy

# Create a netplan file that creates 2 tunnels:

cat > /etc/netplan/10-tun.yaml <<EOF
network:
  renderer: networkd
  version: 2
  tunnels:
    tun0:
      mode: ip6gre
      local: fe80::1
      remote: 2001:dead:beef::2
    tun1:
      mode: vti6
      local: fe80::2
      remote: 2001:dead:beef::3
EOF

# Apply the configuration

netplan apply

# Check with "ip link" that both tun0 and tun1 *were not created* and check networkd for errors:

journalctl -u systemd-networkd

Sep 28 17:04:40 jammy systemd-networkd[360]: tun0: netdev could not be created: Invalid argument
Sep 28 17:04:40 jammy systemd-networkd[360]: tun1: netdev could not be created: Invalid argument

# Download, install and boot on kernel 5.15.0-82.91

wget http://ie.archive.ubuntu.com/ubuntu/pool/main/l/linux-signed/linux-image-5.15.0-82-generic_5.15.0-82.91_amd64.deb http://ie.archive.ubuntu.com/ubuntu/pool/main/l/linux/linux-modules-5.15.0-82-generic_5.15.0-82.91_amd64.deb

dpkg -i *.deb

grub-reboot '1>2' && reboot

# Check with "ip link" again that both tun0 and tun1 were created

# Reboot again to go back to the most recent kernel and check with "ip link" that both tun0 and tun1 were not created.
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Sep 29 12:52 seq
 crw-rw---- 1 root audio 116, 33 Sep 29 12:52 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: [Errno 2] No such file or directory: 'fuser'
CRDA: N/A
CasperMD5CheckResult: unknown
CloudArchitecture: x86_64
CloudID: lxd
CloudName: lxd
CloudPlatform: lxd
CloudSubPlatform: LXD socket API v. 1.0 (/dev/lxd/sock)
DistroRelease: Ubuntu 22.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lspci: Error: [Errno 2] No such file or directory: 'lspci'
Lspci-vt: Error: [Errno 2] No such file or directory: 'lspci'
Lsusb: Error: [Errno 2] No such file or directory: 'lsusb'
Lsusb-t: Error: [Errno 2] No such file or directory: 'lsusb'
Lsusb-v: Error: [Errno 2] No such file or directory: 'lsusb'
MachineType: QEMU Standard PC (Q35 + ICH9, 2009)
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=screen-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 virtio_gpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-84-generic root=/dev/sda2 ro quiet splash console=tty1 console=ttyS0 vt.handoff=7
ProcVersionSignature: Ubuntu 5.15.0-84.93-generic 5.15.116
RelatedPackageVersions:
 linux-restricted-modules-5.15.0-84-generic N/A
 linux-backports-modules-5.15.0-84-generic N/A
 linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: jammy uec-images
Uname: Linux 5.15.0-84-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 2/2/2022
dmi.bios.release: 0.0
dmi.bios.vendor: EDK II
dmi.bios.version: unknown
dmi.board.name: LXD
dmi.board.vendor: Canonical Ltd.
dmi.board.version: pc-q35-8.0
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-q35-8.0
dmi.modalias: dmi:bvnEDKII:bvrunknown:bd2/2/2022:br0.0:svnQEMU:pnStandardPC(Q35+ICH9,2009):pvrpc-q35-8.0:rvnCanonicalLtd.:rnLXD:rvrpc-q35-8.0:cvnQEMU:ct1:cvrpc-q35-8.0:sku:
dmi.product.name: Standard PC (Q35 + ICH9, 2009)
dmi.product.version: pc-q35-8.0
dmi.sys.vendor: QEMU

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2037667

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Jammy):
status: New → Incomplete
Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote : CurrentDmesg.txt

apport information

tags: added: apport-collected jammy uec-images
description: updated
Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote : ProcModules.txt

apport information

Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote : UdevDb.txt

apport information

Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote : WifiSyslog.txt

apport information

Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote : acpidump.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → New
Changed in linux (Ubuntu Jammy):
status: Incomplete → New
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2037667

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Jammy):
status: New → Incomplete
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu Jammy):
status: Incomplete → Confirmed
tags: added: kernel-bug regression-update
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Upstream commit b0ad3c179059089d809b477a1d445c1183a7b8fe ("rtnetlink: call validate_linkmsg in rtnl_create_link") sounds like a good candidate as the culprit here. I see no fixes referencing it upstream, though.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Testing with that commit reverted, it works. Testing on a 6.5 kernel on jammy fails. When testing on mantic, it works. When testing on mantic, on an lxc jammy container, it fails inside the container, with that same kernel.

So, it looks like systemd-networkd sets up the Local tunnel address as the IFLA_ADDR, and the Remote tunnel address as the IFLA_BROADCAST netlink attributes. This is what the validation from commit b0ad3c179059 is trying to validate. Those attributes are supposed to be hardware addresses, not the tunnel addresses.

This needs to be investigated on the systemd side. As I read the systemd code from the jammy version, I didn't find where this would be incorrectly set. It may be necessary to revert this kernel code, fix systemd, and wait for some time before reintroducing this kernel fix. That all depends on why exactly systemd is doing this. Depending on the resulting investigation, we will need to reassess if this can be reasonably be used to revert that commit upstream. But given newer systemd does the right thing, it might be hard to do it.

Cascardo.

Changed in systemd (Ubuntu):
status: New → Invalid
Changed in linux (Ubuntu Jammy):
importance: Undecided → High
Changed in systemd (Ubuntu Jammy):
importance: Undecided → High
Changed in linux (Ubuntu Jammy):
status: Confirmed → Triaged
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Lukas Märdian (slyon)
tags: added: rls-jj-incoming
Lukas Märdian (slyon)
Changed in systemd (Ubuntu):
status: Invalid → Confirmed
tags: added: foundations-todo
removed: rls-jj-incoming
Changed in systemd (Ubuntu Jammy):
assignee: nobody → Nick Rosbrook (enr0n)
Revision history for this message
Lukas Märdian (slyon) wrote :

We talked to our systemd maintainer inside foundations and also reached out to upstream system [1], which didn't lead anywhere. We shouldn't need to bisect systemd-networkd and do a high impact systemd SRU to fix a regression in a kernel update, where we can pinpoint the exact (small) git commit.

We'd like to ask to have the kernel commit b0ad3c179059 reverted in the Jammy kernels, to mitigate this regression.

[1] https://lists.freedesktop.org/archives/systemd-devel/2023-November/049693.html

Revision history for this message
Lukas Märdian (slyon) wrote :

So after bisecting systemd (v249..v251) on Jammy, this seems to be the commit that changed behavior for the better:

https://github.com/systemd/systemd/commit/9f0cf80dd007491698978dbfe38158d74c1c9526

It probably needs some backporting for v249.11.

Revision history for this message
Lukas Märdian (slyon) wrote :

This seems to do the trick using the following Netplan config as a testcase.

All 3 interfaces (tun[0-2]) are correctly created with their local & remote properties set:

root@jj-abi:~/systemd# ip link show dev tun0
19: tun0@NONE: <NOARP,UP,LOWER_UP> mtu 1448 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/gre6 fe80::1 brd 2001:dead:beef::2 permaddr e262:75ae:2aa0::
root@jj-abi:~/systemd# ip link show dev tun1
20: tun1@NONE: <NOARP,UP,LOWER_UP> mtu 1332 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/tunnel6 fe80::2 brd 2001:dead:beef::3 permaddr c2aa:7446:cc29::
root@jj-abi:~/systemd# ip link show dev tun2
21: tun2@NONE: <NOARP,UP,LOWER_UP> mtu 1444 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/gre6 fe80::3 brd 2001:dead:beef::4 permaddr 3ef2:274c:a921::

Netplan test config:
```
network:
  version: 2
  tunnels:
    tun0:
      mode: ip6gre
      local: fe80::1
      remote: 2001:dead:beef::2
    tun1:
      mode: vti6
      local: fe80::2
      remote: 2001:dead:beef::3
    tun2:
      mode: ip6gre
      key: 1234
      local: fe80::3
      remote: 2001:dead:beef::4
```

Revision history for this message
Lukas Märdian (slyon) wrote :

Actually, we can further reduce the patch to the removal of `.generate_mac = true`. This already fixes the issue we observe here.

Nick Rosbrook (enr0n)
tags: added: systemd-sru-next
tags: added: patch
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu Jammy):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.