Upgrade from 245.4-4ubuntu3.3 to 245.4-4ubuntu3.2 appears to break DNS resolution in some cases

Bug #1902960 reported by David Lawson
66
This bug affects 10 people
Affects Status Importance Assigned to Milestone
cloud-images
New
Undecided
Unassigned
systemd
New
Unknown
cloud-init (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Invalid
Undecided
Unassigned
Groovy
Won't Fix
Undecided
Unassigned
systemd (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Medium
Dan Streetman
Groovy
Fix Released
Medium
Dan Streetman

Bug Description

[impact]

on boot of a specific azure instance, the ID_NET_DRIVER parameter of the instance's eth0 interface is not set. That leads to a failure of systemd-networkd to take control of the interface after a restart of systemd-networkd, which results in DNS failures (at first) and eventually complete loss of networking (once the DHCP lease expires).

[test case]

this occurs on first boot of an instance using the specific image; it is not reproducable using the latest ubuntu image nor any reboot of the affected image, and it has not been reproducable (for me) when using debug-enabled images based on the affected image.

So, while the problem is reproducable using the specific image in question, it's not possible to verify the fix since any change to the image removes reproducability.

however, while the problem itself can't be reproduced and then verified, if the assumption is correct (that the 'add' uevent is being missed on boot), that is possible to test and verify:

$ udevadm info /sys/class/net/eth0 | grep ID_NET_DRIVER
E: ID_NET_DRIVER=hv_netvsc
$ sudo rm /run/udev/data/n2

(note, change 'n2' to whichever network interface index is correct)

$ udevadm info /sys/class/net/eth0 | grep ID_NET_DRIVER
$ sudo udevadm trigger -c change /sys/class/net/eth0
$ udevadm info /sys/class/net/eth0 | grep ID_NET_DRIVER

(note the 'change' uevent did not populate ID_NET_DRIVER property)

$ sudo udevadm trigger -c add /sys/class/net/eth0
$ udevadm info /sys/class/net/eth0 | grep ID_NET_DRIVER
E: ID_NET_DRIVER=hv_netvsc

(note the 'add' uevent did populate ID_NET_DRIVER)

the test verification should result in ID_NET_DRIVER being populated for a 'change' uevent.

[regression potential]

any regression would likely involve problems with systemd-udevd processing 'change' events from network devices, and/or incorrect udevd device properties.

[scope]

this is needed only for focal and groovy.

this is fixed by upstream commit e0e789c1e97 which is first included in v247, so this is fixed already in hirsute.

while this commit is not included in bionic, due to the difficult nature of reproducing (and verifying) this, and the fact it has only been seen once on a focal image, I don't think it's appropriate to SRU to bionic at this point; possibly it may be appropriate if this is ever reproduced with a bionic image.

[other info]

note that this bug's subject and description, as well as the upstream systemd bug subject and description, talk about the problem being DNS resolution. However that is strictly a side-effect of the real problem and is not the actual issue.

[original description]

The systemd upgrade 245.4-4ubuntu3.3 to 245.4-4ubuntu3.2 appears to have broken DNS resolution across much of our Azure fleet earlier today. We ended up mitigating this by forcing reboots on the associated instances, no combination of networkctl reload, reconfigure, systemctl daemon-reexec, systemctl daemon-reload, netplan generate, netplan apply would get resolvectl to have a DNS server again. The main symptom appears to have been systemd-networkd believing it wasn't managing the eth0 interfaces:

ubuntu@machine-1:~$ sudo networkctl
IDX LINK TYPE OPERATIONAL SETUP
  1 lo loopback carrier unmanaged
  2 eth0 ether routable unmanaged
                                                                          2 links listed.

Which eventually made them lose their DNS resolvers:

ubuntu@machine-1:~$ sudo resolvectl dns
Global:
Link 2 (eth0):

After rebooting, we see this behaving properly:

ubuntu@machine-1:~$ sudo networkctl list
IDX LINK TYPE OPERATIONAL SETUP
  1 lo loopback carrier unmanaged
  2 eth0 ether routable configured

2 links listed.
ubuntu@machine-1:~$ sudo resolvectl dns
Global:
Link 2 (eth0): 168.63.129.16

This appears to be specifically linked to the upgrade, i.e. we were able to provoke the issue by upgrading the systemd package, so I suspect it's part of the packaging in the upgrade process.
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu27.10
Architecture: amd64
CasperMD5CheckResult: skip
DistroRelease: Ubuntu 20.04
Lspci-vt:
 -[0000:00]-+-00.0 Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled)
            +-07.0 Intel Corporation 82371AB/EB/MB PIIX4 ISA
            +-07.1 Intel Corporation 82371AB/EB/MB PIIX4 IDE
            +-07.3 Intel Corporation 82371AB/EB/MB PIIX4 ACPI
            \-08.0 Microsoft Corporation Hyper-V virtual VGA
Lsusb: Error: command ['lsusb'] failed with exit code 1:
Lsusb-t:

Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
MachineType: Microsoft Corporation Virtual Machine
Package: systemd 245.4-4ubuntu3.3
PackageArchitecture: amd64
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-1031-azure root=PARTUUID=2e08bba3-68b4-4a16-af3b-47b73bd138a9 ro console=tty1 console=ttyS0 earlyprintk=ttyS0 panic=-1
ProcVersionSignature: Ubuntu 5.4.0-1031.32-azure 5.4.65
Tags: focal uec-images
Uname: Linux 5.4.0-1031-azure x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 12/07/2018
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 090008
dmi.board.name: Virtual Machine
dmi.board.vendor: Microsoft Corporation
dmi.board.version: 7.0
dmi.chassis.asset.tag: 7783-7084-3265-9085-8269-3286-77
dmi.chassis.type: 3
dmi.chassis.vendor: Microsoft Corporation
dmi.chassis.version: 7.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr090008:bd12/07/2018:svnMicrosoftCorporation:pnVirtualMachine:pvr7.0:rvnMicrosoftCorporation:rnVirtualMachine:rvr7.0:cvnMicrosoftCorporation:ct3:cvr7.0:
dmi.product.name: Virtual Machine
dmi.product.uuid: 4412ad79-83fa-f845-b7c2-6f30dd4f1950
dmi.product.version: 7.0
dmi.sys.vendor: Microsoft Corporation

Revision history for this message
David Lawson (deej) wrote : CurrentDmesg.txt

apport information

tags: added: apport-collected focal uec-images
description: updated
Revision history for this message
David Lawson (deej) wrote : Dependencies.txt

apport information

Revision history for this message
David Lawson (deej) wrote : Lspci.txt

apport information

Revision history for this message
David Lawson (deej) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
David Lawson (deej) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
David Lawson (deej) wrote : ProcInterrupts.txt

apport information

Revision history for this message
David Lawson (deej) wrote : ProcModules.txt

apport information

Revision history for this message
David Lawson (deej) wrote : SystemdDelta.txt

apport information

Revision history for this message
David Lawson (deej) wrote : UdevDb.txt

apport information

Revision history for this message
David Lawson (deej) wrote : acpidump.txt

apport information

description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Revision history for this message
Haw Loeung (hloeung) wrote :

Spun up a new unit in Azure and also ran into this:

| https://paste.ubuntu.com/p/zd6z8dZ5Zr/

Revision history for this message
Haw Loeung (hloeung) wrote :

Attached /var/log/syslog.

Revision history for this message
Benjamin Allot (ballot) wrote :

Here is a pastebin of the situation and how I tried to resolve this : https://pastebin.ubuntu.com/p/c6cfKqvBmN/

Unfortunately, the interface stays "unmanaged".

When I check the netplan source (https://github.com/CanonicalLtd/netplan/blob/master/netplan/cli/commands/apply.py#L128), it just stops systemd-networkd service, then start it after generating the file.

Revision history for this message
Dan Streetman (ddstreet) wrote :

This isn't a problem with 3.2 to 3.3 upgrade, this is a problem where on first boot of the azure instance any restart of networkd fails to associate the .network file with the interface, e.g.:

root@lp1902960-3:~# networkctl status eth0
● 2: eth0
             Link File: n/a
          Network File: /run/systemd/network/10-netplan-eth0.network
                  Type: ether
                 State: routable (configured)
                  Path: acpi-VMBUS:01
            HW Address: 00:0d:3a:4e:ec:8c (Microsoft Corp.)
                   MTU: 1500 (min: 68, max: 65521)
  Queue Length (Tx/Rx): 64/64
      Auto negotiation: no
                 Speed: 40Gbps
                Duplex: full
               Address: 10.0.1.4 (DHCP4)
                        fe80::20d:3aff:fe4e:ec8c
               Gateway: 10.0.1.1
                   DNS: 168.63.129.16
        Search Domains: wh32vgcjtxsend1z44t3vj4ibg.bx.internal.cloudapp.net

root@lp1902960-3:~# systemctl restart systemd-networkd
root@lp1902960-3:~# networkctl status eth0
● 2: eth0
             Link File: n/a
          Network File: n/a
                  Type: ether
                 State: routable (unmanaged)
                  Path: acpi-VMBUS:01
            HW Address: 00:0d:3a:4e:ec:8c (Microsoft Corp.)
                   MTU: 1500 (min: 68, max: 65521)
  Queue Length (Tx/Rx): 64/64
      Auto negotiation: no
                 Speed: 40Gbps
                Duplex: full
               Address: 10.0.1.4
                        fe80::20d:3aff:fe4e:ec8c
               Gateway: 10.0.1.1

This is because udev doesn't know about the eth0 'Driver', because the systemd-udev-trigger service didn't correctly run at the proper time to generate uevents for existing devices.

Simply re-running systemd-udev-trigger will update udevd with the correct info (including the Driver value), and then restarting systemd-networkd will again work correctly.

Revision history for this message
Dan Streetman (ddstreet) wrote :

This problem is somehow created only on the first boot, most likely by some magic being performed by cloud-init. If the created instance is rebooted, there is no problem and systemd-networkd can be restarted with no problems.

Marking this as invalid for system as this isn't a systemd issue, this is a cloud problem, probably with cloud-init, or maybe with the cloud image.

Changed in systemd (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Dan Streetman (ddstreet) wrote :

> systemd-udev-trigger service didn't correctly run at the proper time to generate uevents for existing devices

note: as systemd-networkd did correctly associate the .network file at first, my guess would be during boot the .network file was different than it is when boot is finished, which is why a restart causes networkd to remove the match. Only a guess, however - i haven't looked any deeper at what magic is being done by cloud-init.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu):
status: New → Confirmed
Revision history for this message
Dan Watkins (oddbloke) wrote :

Hey folks,

Thanks for the report! If someone could run `cloud-init collect-logs` on an affected instance, and upload the produced tarball to this bug, we can dig into it further. The contents of /etc/netplan would also be very handy.

(Once attached, please move this back to New.)

Cheers,

Dan

Changed in cloud-init (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
David Lawson (deej) wrote :

Sure, not a problem. Here's the contents of /etc/netplan/*

ubuntu@machine-2:~$ cat /etc/netplan/*
# This file is generated from information provided by the datasource. Changes
# to it will not persist across an instance reboot. To disable cloud-init's
# network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    ethernets:
        eth0:
            dhcp4: true
            dhcp4-overrides:
                route-metric: 100
            dhcp6: false
            match:
                driver: hv_netvsc
                macaddress: 00:22:48:0a:47:80
            set-name: eth0
    version: 2

Changed in cloud-init (Ubuntu):
status: Incomplete → New
Revision history for this message
Dan Watkins (oddbloke) wrote :

OK, I've managed to reproduce this (in a non-Juju launched VM). The ordering of these journal lines look suspicious to me:

Nov 09 17:41:51.091033 ubuntu systemd[1]: Starting udev Coldplug all Devices...
Nov 09 17:41:51.236309 ubuntu systemd[1]: Finished Load Kernel Modules.
Nov 09 17:41:51.363482 ubuntu systemd[1]: Finished udev Coldplug all Devices.

Because, you guessed it, hv_netvsc is shipped as a kernel module:

$ lsmod | grep hv_netvsc
hv_netvsc 81920 0

So my assumption is that udev coldplugging of the network device is happening before the driver is loaded, and so (unsurprisingly :) it doesn't find the driver.

I suspect that adding an `After=systemd-modules-load.service` to systemd-udev-trigger-service addresses the issue, but as this only reproduces on first boot this is difficult to test.

Given the above (and the fact that cloud-init doesn't run for another ~5s after these lines), I _think_ this is a systemd/kernel interface issue, not a cloud-init issue.

Dan Watkins (oddbloke)
Changed in cloud-init (Ubuntu):
status: New → Incomplete
Changed in systemd (Ubuntu):
status: Invalid → New
Revision history for this message
Dan Streetman (ddstreet) wrote :

> coldplugging of the network device is happening before the driver is loaded,
> and so (unsurprisingly :) it doesn't find the driver.

so, 'coldplugging' (meaning systemd-udev-trigger service) just runs through all devices in the system and asks them to (re)issue 'add' uevents. if a device doesn't exist yet, it can't be asked to issue an 'add' uevent.

additionally, hotplugging the driver later would generate its own uevents, thus notifying udevd about it. That's the whole point of systemd-udev-trigger, it's only intended to provide uevent notification to udevd for devices that *already* have their drivers loaded (e.g. built-in drivers or ones loaded from the initramfs), and already sent out their 'add' uevents. Any drivers loaded later are hotplug events and send out their own uevents to notify udevd, which is why systemd-udev-trigger doesn't need to run other than once at boot.

additionally it doesn't explain how systemd-networkd *did* match the .network file, even though it doesn't know anything about the interface Driver.

> I suspect that adding an `After=systemd-modules-load.service` to
> systemd-udev-trigger-service addresses the issue,

that shouldn't be needed since systemd-udev-trigger only generates uevents for all pre-existing devices. any devices hotplugged later, including those whose drivers are loaded by systemd-modules-load, will generate their own uevents as their driver enumerates the devices.

I certainly may be wrong about cloud-init, however; that was just my guess on what code might be doing non-standard first-boot magic.

I'm fairly certain it isn't anything in systemd misbehaving, however.

Revision history for this message
Benjamin Allot (ballot) wrote :

Thanks for the explanation.

I confirm that the workaround using "sytemctl restart systemd-udev-trigger && systemctl restart systemd-networkd" does the trick.

@Dan Watkins : did you do some specific thing to reproduce the issue on your local VM ? It would be interesting to see the whole logs happening there.

We could possibly hijack the image to add a
 | udevadm control --log-priority=debug

and see what happens.

Revision history for this message
Dan Watkins (oddbloke) wrote :

Thanks for the explanation, Dan! I was off down a wrong path, I appreciate the correction.

I've just downloaded the Azure image from cloud-images.u.c and it includes this in `/etc/netplan/90-hotplug-azure.yaml`:

# This netplan yaml is delivered in Azure cloud images to support
# attaching and detaching nics after the instance first boot.
# Cloud-init otherwise handles initial boot network configuration in
# /etc/netplan/50-cloud-init.yaml
network:
    version: 2
    ethernets:
        ephemeral:
            dhcp4: true
            match:
                driver: hv_netvsc
                name: '!eth0'
            optional: true
        hotpluggedeth0:
            dhcp4: true
            match:
                driver: hv_netvsc
                name: 'eth0'

This file is not present in a booted system, because cloud-init removes it during boot:

2020-11-09 18:12:09,306 - handlers.py[DEBUG]: start: azure-ds/maybe_remove_ubuntu_network_config_scripts: maybe_remove_ubuntu_network_config_scripts
2020-11-09 18:12:09,307 - DataSourceAzure.py[INFO]: Removing Ubuntu extended network scripts because cloud-init updates Azure network configuration on the following event: System boot.
2020-11-09 18:12:09,307 - util.py[DEBUG]: Attempting to remove /etc/netplan/90-hotplug-azure.yaml
2020-11-09 18:12:09,307 - handlers.py[DEBUG]: finish: azure-ds/maybe_remove_ubuntu_network_config_scripts: SUCCESS: maybe_remove_ubuntu_network_config_scripts

It does this before the regular cloud-init network configuration is written, or `netplan generate` is called:

2020-11-09 18:12:09,465 - util.py[DEBUG]: Writing to /etc/netplan/50-cloud-init.yaml - wb: [644] 603 bytes
2020-11-09 18:12:09,466 - subp.py[DEBUG]: Running command ['netplan', 'generate'] with allowed return codes [0] (shell=False, capture=True)

cloud-init also runs a couple of udevadm commands right after `netplan generate`:

2020-11-09 18:12:09,813 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/eth0'] with allowed return codes [0] (shell=False, capture=True)
2020-11-09 18:12:09,828 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/lo'] with allowed return codes [0] (shell=False, capture=True)

This all happens before systemd-networkd starts:

Nov 09 18:12:09.956027 focal-1604945439 systemd[1]: Starting Network Service...

So: I'm not really sure what's going on here. I've tried restoring `90-hotplug-azure.yaml` and removing `50-cloud-init.yaml`; that doesn't cause the issue to reproduce on a subsequent boot.

One thing worth noting, that could lead to unexpected state: cloud-init performs a DHCP on this interface (in order to be able to fetch the network configuration it is going to apply). It does this in a sandbox (i.e. it doesn't use system configuration for it), but potentially that could mean that there's (kernel?) state for that interface which {udev,network}d interpret in a way that leads to this issue?

Changed in cloud-init (Ubuntu):
status: Incomplete → New
Changed in systemd (Ubuntu):
status: New → Incomplete
Revision history for this message
Dan Watkins (oddbloke) wrote :

(Added cloud-images for visibility.)

Revision history for this message
Dan Watkins (oddbloke) wrote :

I've just tested, and this doesn't seem to reproduce when launching from a captured image (with 90-hotplug-azure.yaml restored and `cloud-init clean` executed). So I think I've exhausted the ways in which I can attempt to gain more insight into what's happening during the part of boot where this reproduces.

I think we're going to need an image published with some debugging built into it (which, hopefully, will continue to reproduce the issue). If https://paste.ubuntu.com/p/qkwmDvRRrB/ is installed and enabled, we should get a whole bunch of udev information, which may shed some light onto what's going on from an ordering POV.

I'm not sure if there's any more networking/networkd-specific debugging we should also add.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu):
status: New → Confirmed
Revision history for this message
Dan Streetman (ddstreet) wrote :

Unfortunately (or maybe fortunately?) I'm no longer able to reproduce this when deploying Ubuntu 20.04 instances in azure.

@deej are you able to reproduce this with newly deployed 20.04 instances?

Revision history for this message
Dan Streetman (ddstreet) wrote :

As I mentioned in the upstream systemd bug, it's still not clear what exactly is causing this, and the inability to reproduce it after the image first boot, or with another image with debug added, or with the latest image, means it will be impossible to verify any fix to this problem.

However, my best guess is udevd is somehow missing the 'add' uevent for the network device, as I described in the upstream bug. While I don't know why it would miss the event, the upstream commit e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151 should fix the problem, as it will setup the proper udevd device parameters not only for 'add' but also 'change' uevents.

So, based on those assumptions/guesses and on the relative safety of the upstream commit, I plan to SRU that commit for this bug.

Revision history for this message
Dan Streetman (ddstreet) wrote :

based on comment 29, and the assumption that the referenced upstream commit does actually 'fix' (or at least work around) this, marking as Fix Released for h and later.

description: updated
Changed in systemd (Ubuntu):
status: Incomplete → Opinion
status: Opinion → New
status: New → Fix Released
Changed in systemd (Ubuntu Focal):
assignee: nobody → Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Groovy):
importance: Undecided → Medium
assignee: nobody → Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Focal):
status: New → In Progress
importance: Undecided → Medium
Changed in systemd (Ubuntu Groovy):
status: New → In Progress
Revision history for this message
David Lawson (deej) wrote :

I don't actually know that we've deployed any new instances into Azure in the recent past so I'm not sure we can confirm but that all sounds reasonable.

Changed in systemd:
status: Unknown → New
Revision history for this message
Dan Watkins (oddbloke) wrote :

It sounds to me like there's no cloud-init aspect here, so I'm going to move our tasks to Incomplete (so they'll expire out eventually). Please do set them back if I've missed something!

Changed in cloud-init (Ubuntu):
status: Confirmed → Incomplete
Changed in cloud-init (Ubuntu Focal):
status: New → Incomplete
Changed in cloud-init (Ubuntu Groovy):
status: New → Incomplete
Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello David, or anyone else affected,

Accepted systemd into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/245.4-4ubuntu3.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Dan Streetman (ddstreet)
description: updated
Revision history for this message
Chris Halse Rogers (raof) wrote :

Hello David, or anyone else affected,

Accepted systemd into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/246.6-1ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-groovy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Groovy):
status: In Progress → Fix Committed
tags: added: verification-needed-groovy
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/245.4-4ubuntu3.4)

All autopkgtests for the newly accepted systemd (245.4-4ubuntu3.4) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

linux-hwe-5.8/5.8.0-37.42~20.04.1 (arm64)
netplan.io/0.100-0ubuntu4~20.04.3 (amd64)
apt/2.0.2ubuntu0.2 (armhf)
munin/2.0.56-1ubuntu1 (ppc64el)
gvfs/1.44.1-1ubuntu1 (amd64, arm64)
prometheus-alertmanager/0.15.3+ds-3ubuntu1 (armhf)
lxc/1:4.0.2-0ubuntu1 (amd64)
indicator-session/17.3.20+19.10.20190921-0ubuntu1 (ppc64el)
pyudev/0.21.0-3ubuntu1 (s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/246.6-1ubuntu1.1)

All autopkgtests for the newly accepted systemd (246.6-1ubuntu1.1) for groovy have finished running.
The following regressions have been reported in tests triggered by the package:

gvfs/1.46.1-1ubuntu1 (arm64, amd64)
prometheus/2.20.0+ds-1 (s390x)
netplan.io/0.100-0ubuntu5 (arm64)
flatpak/1.8.2-1 (arm64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/groovy/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Benjamin Allot (ballot) wrote :

I confirm I got it working at first boot on azure with systemd-245.4-4ubuntu3.4

```
ubuntu@machine-3:~$ sudo networkctl
IDX LINK TYPE OPERATIONAL SETUP
  1 lo loopback carrier unmanaged
  2 eth0 ether routable configured

2 links listed.
ubuntu@machine-3:~$ sudo apt update
Hit:1 http://ppa.launchpad.net/telegraf-devs/ppa/ubuntu focal InRelease
Hit:2 http://us.archive.ubuntu.com/ubuntu focal InRelease
Get:3 http://us.archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:4 http://us.archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB]
Get:5 http://us.archive.ubuntu.com/ubuntu focal-security InRelease [109 kB]
Get:6 http://us.archive.ubuntu.com/ubuntu focal-proposed InRelease [267 kB]
Fetched 591 kB in 3s (225 kB/s)
Reading package lists... Done
Building dependency tree
Reading state information... Done
All packages are up to date.
ubuntu@machine-3:~$ dpkg -l systemd
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============-================-============-=================================
ii systemd 245.4-4ubuntu3.4 amd64 system and service manager

```

Revision history for this message
Dan Streetman (ddstreet) wrote :

groovy:

root@lp1902960-g:~# dpkg -l systemd|grep systemd
ii systemd 246.6-1ubuntu1 amd64 system and service manager
root@lp1902960-g:~# udevadm info /sys/class/net/ens3 | grep ID_NET_DRIVER
E: ID_NET_DRIVER=virtio_net
root@lp1902960-g:~# rm /run/udev/data/n2
root@lp1902960-g:~# udevadm info /sys/class/net/ens3 | grep ID_NET_DRIVER
root@lp1902960-g:~# udevadm trigger -c change /sys/class/net/ens3
root@lp1902960-g:~# udevadm info /sys/class/net/ens3 | grep ID_NET_DRIVER
root@lp1902960-g:~# udevadm trigger -c add /sys/class/net/ens3
root@lp1902960-g:~# udevadm info /sys/class/net/ens3 | grep ID_NET_DRIVER
E: ID_NET_DRIVER=virtio_net

root@lp1902960-g:~# dpkg -l systemd|grep systemd
ii systemd 246.6-1ubuntu1.1 amd64 system and service manager
root@lp1902960-g:~# udevadm info /sys/class/net/ens3 | grep ID_NET_DRIVER
E: ID_NET_DRIVER=virtio_net
root@lp1902960-g:~# rm /run/udev/data/n2
root@lp1902960-g:~# udevadm info /sys/class/net/ens3 | grep ID_NET_DRIVER
root@lp1902960-g:~# udevadm trigger -c change /sys/class/net/ens3
root@lp1902960-g:~# udevadm info /sys/class/net/ens3 | grep ID_NET_DRIVER
E: ID_NET_DRIVER=virtio_net

Revision history for this message
Dan Streetman (ddstreet) wrote :

focal:

root@lp1902960-f:~# dpkg -l systemd|grep systemd
ii systemd 245.4-4ubuntu3.3 amd64 system and service manager
root@lp1902960-f:~# udevadm info /sys/class/net/ens3 | grep ID_NET_DRIVER
E: ID_NET_DRIVER=virtio_net
root@lp1902960-f:~# rm /run/udev/data/n2
root@lp1902960-f:~# udevadm info /sys/class/net/ens3 | grep ID_NET_DRIVER
root@lp1902960-f:~# udevadm trigger -c change /sys/class/net/ens3
root@lp1902960-f:~# udevadm info /sys/class/net/ens3 | grep ID_NET_DRIVER
root@lp1902960-f:~# udevadm trigger -c add /sys/class/net/ens3
root@lp1902960-f:~# udevadm info /sys/class/net/ens3 | grep ID_NET_DRIVER
E: ID_NET_DRIVER=virtio_net

root@lp1902960-f:~# dpkg -l systemd|grep systemd
ii systemd 245.4-4ubuntu3.4 amd64 system and service manager
root@lp1902960-f:~# udevadm info /sys/class/net/ens3 | grep ID_NET_DRIVER
E: ID_NET_DRIVER=virtio_net
root@lp1902960-f:~# rm /run/udev/data/n2
root@lp1902960-f:~# udevadm info /sys/class/net/ens3 | grep ID_NET_DRIVER
root@lp1902960-f:~# udevadm trigger -c change /sys/class/net/ens3
root@lp1902960-f:~# udevadm info /sys/class/net/ens3 | grep ID_NET_DRIVER
E: ID_NET_DRIVER=virtio_net

tags: added: verification-done verification-done-focal verification-done-groovy
removed: verification-needed verification-needed-focal verification-needed-groovy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 246.6-1ubuntu1.1

---------------
systemd (246.6-1ubuntu1.1) groovy; urgency=medium

  [ Dan Streetman ]
  * d/t/boot-smoke: update test to avoid false negatives
    (LP: #1892358)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=205c30ca53b0e421db28bb56afaf5f88650ce592
  * d/t/boot-and-services: remove unneeded test lines
    (LP: #1892358)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=71853082af4e668996db574915c5a156f9897fd3
  * d/t/systemd-fsckd: rewrite test to try to fix false negatives
    (LP: #1892358)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=6ae6be039ec582410769d2d6d131e12bdcd19a68
  * d/p/lp1905044-test-use-cap_last_cap-for-max-supported-cap-number-n.patch:
    test: use cap_last_cap() instead of capability_list_length()
    (LP: #1905044)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=84a4832f5f7d4f939c1c78c6be4c3f9e05cd7f59
  * d/p/lp1907306/0001-sd-dhcp-client-don-t-log-timeouts-if-already-expired.patch,
    d/p/lp1907306/0002-sd-dhcp-client-track-dhcp4-t1-t2-expire-times.patch,
    d/p/lp1907306/0003-sd-dhcp-client-add-RFC2131-retransmission-details.patch,
    d/p/lp1907306/0004-sd-dhcp-client-simplify-dhcp4-t1-t2-parsing.patch,
    d/p/lp1907306/0005-sd-dhcp-client-correct-dhcpv4-renew-rebind-retransmi.patch,
    d/p/lp1907306/0006-sd-dhcp-client-correct-retransmission-timeout-to-mat.patch,
    d/p/lp1907306/0007-test-network-increase-wait_online-timeout-to-handle-.patch,
    d/p/lp1907306/0008-sd-dhcp-client-fix-renew-rebind-timeout-calculation-.patch:
    Send correct number of dhcpv4 renew and rebind requests
    (LP: #1907306)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=0a96dc16ac00e90cd3904e6d490d676b9bb98f1f
  * d/p/lp1902960-udev-re-assign-ID_NET_DRIVER-ID_NET_LINK_FILE-ID_NET.patch:
    Run net_setup_link on 'change' uevents (LP: #1902960)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=7183e2ef4758ce47b152dec735e7d213d6003e37
  * d/t/root-unittests:
    Remove any corrupt journal files (LP: #1881947)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=3d0ea66f0db4a204759fa0005f6f27579ee4195a

  [ Balint Reczey ]
  * d/t/systemd-fsckd: Plymouth-start stays active in 20.10 and later
    (LP: #1908067)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=e3ddd09301c8bdaa59b4fe54d7906f609552370d

 -- Dan Streetman <email address hidden> Wed, 06 Jan 2021 15:40:39 -0500

Changed in systemd (Ubuntu Groovy):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for systemd has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 245.4-4ubuntu3.4

---------------
systemd (245.4-4ubuntu3.4) focal; urgency=medium

  * d/p/lp1905245/0001-basic-cap-list-parse-print-numerical-capabilities.patch,
    d/p/lp1905245/0002-basic-capability-util-let-cap_last_cap-return-unsign.patch,
    d/p/lp1905245/0003-basic-cap-list-reduce-scope-of-variables.patch:
    - print number of unknown capabilities instead of failing
      (LP: #1905245)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=5cd98102e16a6e4acc1444b10db3308d87930933
  * d/p/lp1890448-hwdb-Add-EliteBook-to-use-micmute-hotkey.patch:
    Add EliteBook to use micmute hotkey (LP: #1890448)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=238c8c1a7b9d75f69bdeafb1d55f1faf00acb063
  * d/extra/dhclient-enter-resolved-hook:
    suppress output of cmp command in dhclient hook (LP: #1878955)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=83df4fc182f8ffe87256f5d7c4b49cee5192529a
  * d/p/lp1905044-test-use-cap_last_cap-for-max-supported-cap-number-n.patch:
    test: use cap_last_cap() instead of capability_list_length()
    (LP: #1905044)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=ff21f41e624d9e603f3be463846ce981a433842a
  * d/p/lp1903300/0001-network-VXLan-fix-adding-Group-address.patch,
    d/p/lp1903300/0002-network-VXLan-Add-support-for-remote-address.patch,
    d/p/lp1903300/0003-networkctl-Add-support-to-display-VXLan-remote-addre.patch:
    set vxlan multicast group when specified (LP: #1903300)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=9deff4b7c5495dbe738561ca47daf3756df9fcde
  * d/p/lp1907306/0001-sd-dhcp-client-don-t-log-timeouts-if-already-expired.patch,
    d/p/lp1907306/0002-sd-dhcp-client-track-dhcp4-t1-t2-expire-times.patch,
    d/p/lp1907306/0003-sd-dhcp-client-add-RFC2131-retransmission-details.patch,
    d/p/lp1907306/0004-sd-dhcp-client-simplify-dhcp4-t1-t2-parsing.patch,
    d/p/lp1907306/0005-sd-dhcp-client-correct-dhcpv4-renew-rebind-retransmi.patch,
    d/p/lp1907306/0006-sd-dhcp-client-correct-retransmission-timeout-to-mat.patch,
    d/p/lp1907306/0007-test-network-increase-wait_online-timeout-to-handle-.patch,
    d/p/lp1907306/0008-sd-dhcp-client-fix-renew-rebind-timeout-calculation-.patch:
    Send correct number of dhcpv4 renew and rebind requests
    (LP: #1907306)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=a73c51d0df284dcc38e6924d40eed810554bab2e
  * d/p/lp1902960-udev-re-assign-ID_NET_DRIVER-ID_NET_LINK_FILE-ID_NET.patch:
    Run net_setup_link on 'change' uevents (LP: #1902960)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=ec7ba2358aa68d8d6276ed56ef91caafc287cecf
  * d/t/root-unittests:
    Remove any corrupt journal files (LP: #1881947)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=5481fececdb3cb35ca7118598cad537681b5ff14

 -- Dan Streetman <email address hidden> Wed, 06 Jan 2021 15:47:39 -0500

Changed in systemd (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote :

The Groovy Gorilla has reached end of life, so this bug will not be fixed for that release

Changed in cloud-init (Ubuntu Groovy):
status: Incomplete → Won't Fix
Revision history for this message
Yoann Dubreuil (ydubreuil) wrote :

Hi,

We are affected by this issue on Bionic. Upgrading systemd package breaks DNS resolution in VMs that didn't have the workaround above applied. Would it be possible to provide a backport of the systemd fix done to Bionic?

We are currently relying on the following workaround to recover from the loss of DNS resolution:

udevadm trigger -c add /sys/class/net/eth0
systemctl restart systemd-networkd

It seems that we are able to consistently reproduce this problem in our VM images, we had multiple VM images affected by this issue. We'd be happy to provide debug information if you want to investigate the root cause further. Of course I can't tell for sure if the next VM image we build will be affected or not.

Thanks,

Revision history for this message
James Falcon (falcojr) wrote :

"It sounds to me like there's no cloud-init aspect here, so I'm going to move our tasks to Incomplete (so they'll expire out eventually). Please do set them back if I've missed something!"

This expiration never happened, and no additional comments indicate that cloud-init has a problem, so I'm going to change cloud-init to Invalid.

Changed in cloud-init (Ubuntu):
status: Incomplete → Invalid
Changed in cloud-init (Ubuntu Focal):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.