netplan: routing autopkgtest fails on armhf

Bug #2073997 reported by Nick Rosbrook
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Auto Package Testing
Fix Released
Undecided
Skia
Netplan
Fix Committed
Undecided
Unassigned
netplan.io (Ubuntu)
Triaged
Low
Lukas Märdian

Bug Description

The netplan armhf autopkgtest is currently blocking systemd migration. I see the same test failure across different triggers, so I suspect a migration-reference test will fail (which I will trigger to unblock systemd).

The failure is the following:

1904s ======================================================================
1904s FAIL: test_vrf_basic (__main__.TestNetworkManager.test_vrf_basic)
1904s ----------------------------------------------------------------------
1904s Traceback (most recent call last):
1904s File "/tmp/autopkgtest.Cisn8W/build.sPu/src/tests/integration/routing.py", line 307, in test_vrf_basic
1904s self.generate_and_settle([self.dev_e_client])
1904s File "/tmp/autopkgtest.Cisn8W/build.sPu/src/tests/integration/base.py", line 373, in generate_and_settle
1904s self.nm_wait_connected(iface, 60)
1904s File "/tmp/autopkgtest.Cisn8W/build.sPu/src/tests/integration/base.py", line 512, in nm_wait_connected
1904s self.wait_output(['nmcli', 'dev', 'show', iface], '(connected', timeout)
1904s File "/tmp/autopkgtest.Cisn8W/build.sPu/src/tests/integration/base.py", line 509, in wait_output
1904s self.fail('timed out waiting for "{}" to appear in {}'.format(expected_output, cmd))
1904s AssertionError: timed out waiting for "(connected" to appear in ['nmcli', 'dev', 'show', 'eth42']
1904s
1904s ======================================================================
1904s FAIL: test_vrf_basic (__main__.TestNetworkd.test_vrf_basic)
1904s ----------------------------------------------------------------------
1904s Traceback (most recent call last):
1904s File "/tmp/autopkgtest.Cisn8W/build.sPu/src/tests/integration/routing.py", line 307, in test_vrf_basic
1904s self.generate_and_settle([self.dev_e_client])
1904s File "/tmp/autopkgtest.Cisn8W/build.sPu/src/tests/integration/base.py", line 375, in generate_and_settle
1904s self.networkd_wait_connected(iface, 60)
1904s File "/tmp/autopkgtest.Cisn8W/build.sPu/src/tests/integration/base.py", line 516, in networkd_wait_connected
1904s self.wait_output(['networkctl', 'status', iface], '(configured', timeout)
1904s File "/tmp/autopkgtest.Cisn8W/build.sPu/src/tests/integration/base.py", line 509, in wait_output
1904s self.fail('timed out waiting for "{}" to appear in {}'.format(expected_output, cmd))
1904s AssertionError: timed out waiting for "(configured" to appear in ['networkctl', 'status', 'eth42']
1904s
1904s ----------------------------------------------------------------------
1904s Ran 28 tests in 285.909s
1904s
1904s FAILED (failures=2, skipped=3)
1904s autopkgtest [11:27:54]: test routing: -----------------------]

I have only seen this so far on armhf, but I see it in both [1] and [2], with python3-defaults/3.12.4-1 and systemd/256.2-1ubuntu1 as triggers, respectively.

[1] https://autopkgtest.ubuntu.com/results/autopkgtest-oracular/oracular/armhf/n/netplan.io/20240724_120458_00bda@/log.gz
[2] https://autopkgtest.ubuntu.com/results/autopkgtest-oracular/oracular/armhf/n/netplan.io/20240723_152614_c1bcc@/log.gz

Revision history for this message
Lukas Märdian (slyon) wrote (last edit ):

I thought we would have fixed that by having the 'vrf' module available on the autopkgtest hosts, but it seems to be flaky.

Also, an upstream fix is pending: https://github.com/canonical/netplan/commit/586694b6beb32e36be868dff0c88cc116bf2ed8b

Changed in netplan.io (Ubuntu):
assignee: nobody → Lukas Märdian (slyon)
tags: removed: rls-oo-incoming
Changed in netplan.io (Ubuntu):
status: New → Triaged
importance: Undecided → Low
Lukas Märdian (slyon)
Changed in netplan:
status: New → Fix Committed
Changed in auto-package-testing:
assignee: nobody → Skia (hyask)
Revision history for this message
Skia (hyask) wrote (last edit ):

This module had already been added to the armhf machines, but unfortunately, things seems to have broken. It is now fixed, and should hold much better, but here is the story.

Because of bug 2067633, we have to install a newer kernel on the Jammy arm64 VM that run the LXD armhf containers. It was done with `apt install --install-suggests linux-virtual-hwe-22.04`, to have a chain of dependencies bringing `linux-modules-extra`. That chain of dependency, for some reason doesn't seem to survive the upgrades (would need to be reproduced and verified, this is just an hypothesis at this point), so I looked for a more explicit install. After a chat with a kernel guy (thanks @smb), I was explained that even though the name is confusing, the package I needed was `linux-image-extra-virtual-hwe-22.04`. I've put that in our cloud-config file [1] for long term fix, and manually installed it for short term.

Another long term fix that was overlooked before was making sure the module is loaded on boot [2]. That had been done manually, but for workers that have been trashed and rebuilt, it had obviously disappeared.

[1]: https://git.launchpad.net/autopkgtest-cloud/commit/?id=16c5efebd46eeacb804d3703e5e9fc1e095868fe
[2]: https://git.launchpad.net/autopkgtest-cloud/commit/?id=42c9eb93cf9c5634f6211923ee7847fc7142ef61

Revision history for this message
Skia (hyask) wrote :

With the current arm64 outage also impacting the armhf workers, some VMs are pending reboot to apply the latest kernel upgrades and thus have the vrf module. This situation should self-heal with the reboots, but in the meantime I have two VMs (over 16) that don't have the module loaded yet. I'm setting this task as "Fix committed", and will set "Fix released" when I'll see proof that every machine has the module after their reboot.

Changed in auto-package-testing:
status: New → Fix Committed
Revision history for this message
Skia (hyask) wrote :

The autopkgtest side of this issue is observed to be properly working now.

Changed in auto-package-testing:
status: Fix Committed → Fix Released
Revision history for this message
Lukas Märdian (slyon) wrote (last edit ):

Looks like the "modprobe" condition does not fully work in containers, as it'll try loading the module from within the container, but the modules will be installed/loaded on the host.

Maybe we should rather check /proc/modules instead.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.