systemd service dependency loop between cloud-init, NetworkManager and dbus

Bug #2081124 reported by Yao Wei
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OEM Priority Project
Incomplete
Critical
Unassigned
cloud-init (Ubuntu)
Fix Released
Critical
Alberto Contreras
Noble
Fix Released
Undecided
Unassigned
livecd-rootfs (Ubuntu)
Invalid
Undecided
Unassigned
Noble
Invalid
Undecided
Unassigned

Bug Description

[ Impact ]

cloud-init 24.2 shifted the systemd configuration of cloud-init-hotplugd.socket to earlier in boot before sysinit.target, but still retained the systemd unit DefaultDependencies. This lead to a systemd ordering cycle which affects only Ubuntu Live Desktop image on 24.04 (Noble) and 24.10 (Oracular) due to a custom system drop in for cloud-init.service provided by livecd-rootfs which orders cloud-init.service After=NetworkManager.service NetworkManager-wait-online.service.

The affected systemd ordering cycle messages are visible in journalctl -b 0 in either Desktop ephemeral boot or first boot post-installation.

It may result in either cloud-init-hotplug.service, NetworkManager.service or dbus.socket deleted from the systemd boot goals resulting in an unresponsive system at first boot.

Without this changeset, Ubuntu Live Desktop launches of ephemeral boot (or first boot after install) can see "ordering cycle" messages in journalctl -b 0 which leads systemd to kick outany of the following potential conflicting services:
- cloud-init-hotplugd.service
- NetworkManager.service
- dbus.service

[ Test Plan ]
Validate both desktop and server images do not expose systemd ordering cycle issues related to hotplug

== Test case 1 (desktop) ==
Download daily noble desktop live image from https://cdimage.ubuntu.com/daily-live/20240421/

1.Launch in virt-manager or qemu-kvm.
2. Bring up a gnome terminal during ephemeral boot before responding to any configuration prompts Alt-Ctrl-T
3. Confirm ordering cycle issues: journalctl -b 0 | grep "ordering cycle"
4. Shutdown daily failing image
5. Follow https://help.ubuntu.com/community/LiveCDCustomization#Amending_the_LiveCD_Squash_Files_System to update cloud-init from -proposed in this daily Live Desktop ISO, creating a new desktop-noble-cloud-init-proposed.iso
6. Launch in virt-manager or qemu-kvm
7. Confirm ordering cycle is resolved: journalctl -b 0 | grep "ordering cycle"
8. Confirm all affected services are healthy
for service_name in NetworkManager.service dbus.service cloud-init-hotplugd.socket cloud-init-hotplugd.service; do
 systemctl status $service_name
done
9. Complete live installer prompts and reboot into "first boot"
10. Login and confirm no ordering cycles on first boot: Atl-Ctrl-T: journalctl -b 0 | grep "ordering cycle"
11. Assert previously affected services are healthy:
for service_name in NetworkManager.service dbus.service cloud-init-hotplugd.socket cloud-init-hotplugd.service; do
 systemctl status $service_name
done
12. Assert cloud-init is healthy: cloud-init status --format=yaml

== Test case 2 (server) broad integration test coverage ==
1. Run full suite of cloud-init integration tests using the ppa:cloud-init--proposed PPA against lxd_container lxd_vm
CLOUD_INIT_PLATFORM=lxd_vm CLOUD_INIT_CLOUD_INIT_SOURCE=PROPOSED CLOUD_INIT_OS_IMAGE=noble tox -e integration-tests

CLOUD_INIT_PLATFORM=lxd_container CLOUD_INIT_CLOUD_INIT_SOURCE=PROPOSED

2. Run hotplug specific integration tests against ec2 and azure
CLOUD_INIT_PLATFORM=ec2 CLOUD_INIT_CLOUD_INIT_SOURCE=PROPOSED CLOUD_INIT_OS_IMAGE=noble tox -e integration-tests -- tests/integration_tests/modules/test_hotplug.py

CLOUD_INIT_PLATFORM=azure CLOUD_INIT_CLOUD_INIT_SOURCE=PROPOSED CLOUD_INIT_OS_IMAGE=noble tox -e integration-tests -- tests/integration_tests/modules/test_hotplug.py

3. validate no negative impacts to boot speed
Leverage https://github.com/canonical/server-test-scripts/pull/201 to get qemu-kvm samples of before/after this changeset to ensure boot speed is not negatively impacted.

[ Where problems can occur ]

 * This upload is a direct resolution of where problems could occur. If there are systemd ordering cycles introduced by new systemd units or services, systemd may punt conflicting services out of boot goals for the system. If critical services are deleted from boot goals, the system, and affected services will not be brought up and configured as anticipated. This leads to misconfigured, unconfigured or inaccessible systems. The good news is that the symptom of systemd ordering cycles is easily detected during systemd generator timeframe and systemd leaves logs in journalctl about any affected services when this occurs.

[ Other Info ]

This bug in systemd ordering was not seen in Oracular Live images originally because of a separate bug: https://bugs.launchpad.net/ubuntu/+source/livecd-rootfs/+bug/2081325 where Desktop image overrides were not being applied to cloud-init-network.service (Oracular only). So Oracular did not surface this systemd ordering cycle issue. The livecd-rootfs bug has been accepted into Oracular Sept 23rd, so that release would have also exhibited this broken behavior if the resulting fix from cloud-init was not also was accepted to Oracular Sept 23rd as well.

[ Original Description ]
We got errors that some services like snapd and NetworkManager is not started when running cloud-init or desktop, excerpt from journal below:

Sep 13 12:37:41 localhost.localdomain systemd[1]: cloud-init.service: Found ordering cycle on NetworkManager-wait-online.service/start
Sep 13 12:37:41 localhost.localdomain systemd[1]: cloud-init.service: Found dependency on basic.target/start
Sep 13 12:37:41 localhost.localdomain systemd[1]: cloud-init.service: Found dependency on sockets.target/start
Sep 13 12:37:41 localhost.localdomain systemd[1]: cloud-init.service: Found dependency on cloud-init-hotplugd.socket/start
Sep 13 12:37:41 localhost.localdomain systemd[1]: cloud-init.service: Found dependency on cloud-config.target/start
Sep 13 12:37:41 localhost.localdomain systemd[1]: cloud-init.service: Found dependency on cloud-init.service/start
Sep 13 12:37:41 localhost.localdomain systemd[1]: cloud-init.service: Job NetworkManager-wait-online.service/start deleted to break ordering cycle starting with cloud-init.service/start
Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Found ordering cycle on dbus.service/start
Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Found dependency on basic.target/start
Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Found dependency on sockets.target/start
Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Found dependency on cloud-init-hotplugd.socket/start
Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Found dependency on cloud-config.target/start
Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Found dependency on cloud-init.service/start
Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Found dependency on NetworkManager.service/start
Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Job dbus.service/start deleted to break ordering cycle starting with NetworkManager.service/start

Related logs and service files are attached in sosreport.

Internal reference: NANTOU-473

Revision history for this message
Yao Wei (medicalwei) wrote :
Changed in oem-priority:
importance: Undecided → Critical
tags: added: oem-priority
description: updated
summary: - systemd service dependency between cloud-init, NetworkManager an dbus
+ systemd service dependency loop between cloud-init, NetworkManager an
+ dbus
Yao Wei (medicalwei)
summary: - systemd service dependency loop between cloud-init, NetworkManager an
+ systemd service dependency loop between cloud-init, NetworkManager and
dbus
tags: added: jira-somerville-1010
Revision history for this message
Yao Wei (medicalwei) wrote :

The cloud-init.service file is provided by livecd-rootfs, but we aren't sure what triggered this issue.

Rex Tsai (chihchun)
tags: added: jira-nantou-473
Bill Yu (billchyu)
tags: added: jira-stella-191
Revision history for this message
Chad Smith (chad.smith) wrote (last edit ):

Thank your for filing this bug, if possible, please perform the following to aid in debugging:
<snip> strike the former requests in this comment as we can get the journal.log and pkg versions and cloud-init logs from the sos report~~ Will add a separate request if anything else is missing.

Revision history for this message
Chad Smith (chad.smith) wrote :

From the SOS report issue presents on image with this build-info with whatever additional OEM config is presented as autoinstall-user-data

Ubuntu OEM 24.04.1 LTS "Noble Numbat" - Release amd64 (20240911)

I'm trying to download latest Ubuntu Desktop noble to reproduce this problem from live Desktop installer images dated Sept 19th and not seeing the ordering cycle issues on stock daily Desktop noble images.

How is this reproducible?

In the meantime, from sos logs I see the following:

 1. the Desktop installer ephemeral boot stage successfully ran all 4 boot stages (init-local, init, config-modules and config-final) of cloud-init during init as seen in the sos report's var/log/installer/cloud-init.log and cloud-init-output.log. So, this means no ordering cycle present in initial unaltered Desktop images as that would have kicked out Network-Manager-wait-online.service or dbus.service

2. I see ./sos_commands/logs/journalctl_--no-pager which shows the ordering cycle issues in the first of two boots which ejects Network-Manager-wait-online and dbus.service from boot goals on the first boot.

3. I'm not seeing those journal entries mentioned in this bug related to ordering cycles in var/log/installer/installer-journal.txt in the installer ephemeral boot stage which means this problem doesn't seem to affect the unaltered installer environment before "first boot" occurs.

4. Unrelated to this specific bug, we will need a separate bug as a see an undetected systemd ordering cycle in the Desktop ephemeral environment related only to cloud-init-hotplugd.socket in desktop images:

From var/log/installer/installer-journal.txt: sockets.target: Job cloud-init-hotplugd.socket/start deleted to break ordering cycle starting with sockets.target/start

5. I see some APT package installs in sos report var/log/apt/history.log for pkgs for which may or may not have added additional systemd units which could contribute to ordering cycle issues if they also add sytemd units and ordering dependencies
- Commandline: apt-get install --assume-yes --install-suggests oem-nantou-meta desktop-provision-hp
- Commandline: apt install nvidia-driver-550

Do we know of oem-nantou-meta desktop-provision-hp or nvidia-driver-550 deliver systemd units or services?

#4 warrants a separate bug that I shall file after finishing triage on this issue cloud-init hotplug support is optional and opt-in and not generally involved in the default install and configuration of Desktop or server images so this wouldn't be what's breaking OEM installs (but it's a symptom of other ordering problems that need attention in Desktop images

Revision history for this message
Chad Smith (chad.smith) wrote :

I have also confirmed that livecd-rootfs doesn't appear to have changed their "drop-in" of cloud-init.service files https://git.launchpad.net/ubuntu/+source/livecd-rootfs/tree/live-build/functions#n1060 so it's unlikely a new cloud-init issue introduced across an SRU boundary.

This makes me think something has changed in systemd or NetworkManager units and ordering on Noble recently to cause this recent issue or something that the OEM environment is installing that is causing this issue:

From the looks of the var/log/apt/term.log nvidia-kernel-common-550 package seems to be installing a few systemd units

Created symlink /etc/systemd/system/systemd-hibernate.service.wants/nvidia-hibernate.service → /usr/lib/systemd/system/nvidia-hibernate.service.^M^M
Created symlink /etc/systemd/system/systemd-suspend.service.wants/nvidia-resume.service → /usr/lib/systemd/system/nvidia-resume.service.^M^M
Created symlink /etc/systemd/system/systemd-hibernate.service.wants/nvidia-resume.service → /usr/lib/systemd/system/nvidia-resume.service.^M^M
Created symlink /etc/systemd/system/systemd-suspend.service.wants/nvidia-suspend.service → /usr/lib/systemd/system/nvidia-suspend.service.^M^M

I'm guessing we want to look at those and their ordering (systemctl show -p Before,After nvidia-resume.service
and nvidia-suspend.service as that seems to be the only thing outside of stock Desktop images that is altering systemd unit boot order

Revision history for this message
Chad Smith (chad.smith) wrote :

ok I'm able to reproduce the ordering cycle in ephemeral boot stage for cloud-init-hotplug.service on noble images dated 20240916

cloud-init-hotplugd.service/socket may be the symptom tying into this ordering issue too. Digging a bit more today as they may be a symptom of the same problem and systemd might just be deleting NetworkManager-wait-online.service on 'first boot' but it has a slightly different symptom during ephemeral boot stage of the desktop installer.

Revision history for this message
Chad Smith (chad.smith) wrote :

The differences we are seeing between a working Desktop ubuntu image with /var/log/installer/media-info 202408271 are that cloud-init published 24.2 via SRU to the newer Noble image

In that SRU was a shift to when cloud-init-hotplugd.socket and cloud-init-hotplugd.service units are ordered[1]

The change added:
- After=cloud-config.target to cloud-init-hotplugd.socket to allow starting hotplugd.socket earlier
and
- After=cloud-init.target to cloud-init-hotplugd.service to avoid attempting udev NIC hotplug events while cloud-init is still configuring the base instance.

We'll need further investigation tomorrow to determine the ordering cycles we are seeingin Desktop images w/ NetworkManager-wait-online.service here, and why on first boot the systemd unit being deleted is dbus.service yet in ephemeral environment (before first boot) it happens to be cloud-init-hotplugd.service.

[1] https://github.com/canonical/cloud-init/pull/5058

Revision history for this message
Alberto Contreras (aciba) wrote :

I have proposed https://github.com/canonical/cloud-init/pull/5722 as a solution for this issue.

Revision history for this message
Chad Smith (chad.smith) wrote (last edit ):

Tagging this rls-oo-incoming as this ordering issue affects Oracular Desktop images as well.

Changed in cloud-init (Ubuntu):
importance: Undecided → Critical
tags: added: rls-oo-incoming
Revision history for this message
Chad Smith (chad.smith) wrote :

Thanks Alberto!
Upstream cloud-init fix landed per https://github.com/canonical/cloud-init/pull/5722.

Related to this bug is an Oracular Desktop livecd-rootfs update which is needed due to cloud-init.service being renamed to cloud-init-network.service.

https://bugs.launchpad.net/ubuntu/+source/livecd-rootfs/+bug/2081325.

This livecd-rootfs fix in 2081325 was merged today and I presume uploaded to Oracular as well.

Revision history for this message
Chad Smith (chad.smith) wrote :

Fix uploaded to Ubuntu Oracular as 24.4~3+really24.3.1-0ubuntu4

(blocked per Beta freeze) https://launchpad.net/ubuntu/oracular/+queue?queue_state=1&queue_text=cloud-init

Fix uploaded as well to the unapproved queue for SRU as version 24.3.1-0ubuntu0~24.04.2 it is queued behind current SRU of cloud-init 24.3.1-0ubuntu0~24.04.1 to Noble that which should have verification logs complete Monday of next week.
https://launchpad.net/ubuntu/noble/+queue?queue_state=1&queue_text=cloud-init

Changed in cloud-init (Ubuntu):
status: New → In Progress
Changed in livecd-rootfs (Ubuntu):
status: New → Invalid
Revision history for this message
Chad Smith (chad.smith) wrote :

This particular bug is not applicable for livecd-rootfs, as this is a cloud-init systemd ordering issue.
There is a separate livecd-rootfs bug and fix for Oracular only https://bugs.launchpad.net/ubuntu/+source/livecd-rootfs/+bug/2081325 which is already queued for oracular-proposed

Chad Smith (chad.smith)
Changed in cloud-init (Ubuntu):
status: In Progress → Fix Committed
Changed in cloud-init (Ubuntu):
assignee: nobody → Alberto Contreras (aciba)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 24.4~3+really24.3.1-0ubuntu4

---------------
cloud-init (24.4~3+really24.3.1-0ubuntu4) oracular; urgency=medium

  * Bug fix release (LP: #2081124):
    d/p/cpick-hotplugd-systemd-ordering-fix.patch: fix systemd ordering cycle
    issues with network cloud-init-hotplugd.socket, NetworkManager and
    dbus.socket by adding DefaultDependencies=no to cloud-init-hotplugd.socket.

 -- Chad Smith <email address hidden> Fri, 20 Sep 2024 15:31:13 -0600

Changed in cloud-init (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Chad Smith (chad.smith) wrote (last edit ):

Note that this upload for Noble is queued https://launchpad.net/ubuntu/noble/+queue?queue_state=1&queue_text=cloud-init and sitting behind the active SRU 24.3.1 to Noble.
The SRU process bug https://bugs.launchpad.net/bugs/2079224 is just awaiting partner verification feedback on that SRU to publish to Focal, Jammy and Noble. All verification performed by canonical is done on that SRU. Once published, this fix can be reviewed and allowed into -proposed

Changed in livecd-rootfs (Ubuntu Noble):
status: New → Invalid
Changed in cloud-init (Ubuntu Noble):
status: New → Fix Committed
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Moved Noble from Fix Committed to In Progress as this is not yet in noble-proposed, rather uploaded to noble-unapproved.

Changed in cloud-init (Ubuntu Noble):
status: Fix Committed → In Progress
Chad Smith (chad.smith)
description: updated
description: updated
Revision history for this message
Chad Smith (chad.smith) wrote :

Note that https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/2079224 just cleared SRU. So, this bug should be able to progress for review into noble-proposed for verification.

Chad Smith (chad.smith)
description: updated
Chad Smith (chad.smith)
description: updated
Chad Smith (chad.smith)
description: updated
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello Yao, or anyone else affected,

Accepted cloud-init into noble-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/24.3.1-0ubuntu0~24.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-noble to verification-done-noble. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-noble. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in cloud-init (Ubuntu Noble):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-noble
Revision history for this message
Chad Smith (chad.smith) wrote :

Thank you Andreas.

Attached are the logs for all testcases in test case 2. The TLDR is happy hotplug tests on ec2, azure, gce and a solid performance improvement in bootspeed.log time measurements (across 3 samples):
   - Avg time to ssh:`1.12 seconds faster
   - Avg time to spent in systemd on userspace: 0.5 seconds less
   - Avg time spent on all cloud-init boot stages: 0.17 seconds less

Revision history for this message
Chad Smith (chad.smith) wrote :

test-case-1: confirm ordering cycle issues on daily Noble desktop, confirm fix with noble-propose cloud-init

tags: added: verification-done verification-done-noble
removed: verification-needed verification-needed-noble
Revision history for this message
Chad Smith (chad.smith) wrote :

updated test-case-1 desktop logs showing all formerly affected systemd sevices are healthy on first boot

Revision history for this message
Chad Smith (chad.smith) wrote (last edit ):

OEM team would like to expedite SRU aging on this bug now that it is verified because the affected offerings for noble use daily image builds. So, the sooner this fix releases the sooner builds are unblocked.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (8.1 KiB)

Hi Chad,

Thanks for clarifying the rationale for expediting the verification aging, and for the detailed and comprehensive testing details (always high quality).

The specific testing for the regression/change seems well narrowed and verified in the Desktop ISOs (test case 1), thanks.

The regression testing on Server (test case 2) seems broad and extensive, which is great, and I could confirm that the "boot" test with the new cloud-init looks OK (i.e., an image snapshot is built with cloud-init from -proposed, and a new instance is created/booted from that snapshot image), so this looks good too.

I have a pending question regarding the hotplug specific tests (test case 2, item 2), as the service in question is cloud-init-hotplugd.service (even though I guess we'd expect the most regression potential in boot, covered in paragraph above).

It looks like most tests in the requested test_hotplug.py are SKIPPED (summary below), so I just wanted to confirm:
1) Is that is expected? And
2) Is the level of confidence sufficient, based on `test_hotplug_enable_cmd`, which is the test that PASSED in all of azure/ec2/gce and lxd_container/lxd_vm?

Thanks again,
Mauricio

Summary:

$ grep -rc PASSED | sort
azure-noble-hotplug.log:1
bootspeed.log:0
ec2-noble-hotplug.log:5
gce-noble-hotplug.log:1
lxd_container-noble.log:207
lxd_vm-noble.log:233
lxd_vm-noble-retries.log:1

$ grep -rc FAILED | sort
azure-noble-hotplug.log:0
bootspeed.log:0
ec2-noble-hotplug.log:0
gce-noble-hotplug.log:0
lxd_container-noble.log:0
lxd_vm-noble.log:3
lxd_vm-noble-retries.log:0

$ grep -rc SKIPPED | sort
azure-noble-hotplug.log:7
bootspeed.log:0
ec2-noble-hotplug.log:3
gce-noble-hotplug.log:7
lxd_container-noble.log:75
lxd_vm-noble.log:48
lxd_vm-noble-retries.log:0

$ grep -rc '/test_hotplug.py::' *.log | grep hotplug
azure-noble-hotplug.log:8
ec2-noble-hotplug.log:8
gce-noble-hotplug.log:8

$ grep -rc '/test_hotplug.py::.* SKIPPED' *.log | grep hotplug
azure-noble-hotplug.log:7
ec2-noble-hotplug.log:3
gce-noble-hotplug.log:7

$ grep -rF -e '/test_hotplug.py::' -e PASSED -e FAILED -e SKIPPED *.log | grep -F -A1 '/test_hotplug.py::'
azure-noble-hotplug.log:22:05:56 tests/integration_tests/modules/test_hotplug.py::test_hotplug_add_remove SKIPPED [ 12%]u
azure-noble-hotplug.log:22:05:57 tests/integration_tests/modules/test_hotplug.py::test_hotplug_enable_cmd
azure-noble-hotplug.log:22:10:28 PASSED [ 25%]
azure-noble-hotplug.log:22:11:05 tests/integration_tests/modules/test_hotplug.py::test_hotplug_enable_cmd_ec2 SKIPPED [ 37%]
azure-noble-hotplug.log:22:11:05 tests/integration_tests/modules/test_hotplug.py::test_no_hotplug_in_userdata SKIPPED [ 50%]
azure-noble-hotplug.log:22:11:05 tests/integration_tests/modules/test_hotplug.py::test_multi_nic_hotplug SKIPPED [ 62%]
azure-noble-hotplug.log:22:11:05 tests/integration_tests/modules/test_hotplug.py::test_multi_nic_hotplug_vpc SKIPPED [ 75%]
azure-noble-hotplug.log:22:11:05 tests/integration_tests/modules/test_hotplug.py::test_no_hotplug_triggered_by_docker SKIPPED [ 87%]
azure-noble-hotplug.log:22:11:05 tests/integration_tests/modules/test_hotplug.py::test_ni...

Read more...

Revision history for this message
Chad Smith (chad.smith) wrote :

thank you Mauricio for the detailed review!

>>It looks like most tests in the requested test_hotplug.py are SKIPPED (summary below), so I just wanted to confirm:
>> 1) Is that is expected? And

Yes, this skipping is expected on all platforms except for Noble and Ec2 which have a bit more detailed coverage because the platform has better support for automated testing of these features in our integration tests. For the sake of brevity in integration testing I wanted to reference the known integration-test/modules/test_hotplug.py in it's entirety for all platforms as I know the integration tests automatically skip inapplicable tests due to extensive skipIf decorators based on the test platform.

Truly, on most platforms the only test we really cared about validating in that integration test module was
test_hotplug_enabled_by_cmd.

For ec2 though, we have more thorough integration-test support, so we did want to see runs of additional tests including:
test_hotplug_enable_cmd_ec2, test_multi_nic_hotplug,test_no_hotplug_triggered_by_docker, test_nics_before_config_trigger_hotplug and test_multi_nic_hotplug_vpc

2) Is the level of confidence sufficient, based on `test_hotplug_enable_cmd`, which is the test that PASSED in all of azure/ec2/gce and lxd_container/lxd_vm?

Yes this testing is sufficient because we only need to see that the hotplug.socket/service is working at all because that (plus the full integration test suite of lxd_vm/lxd_container) confirms no systemd ordering cycles present preventing cloud-init-hotplug.* services/sockets from starting and running.

The more detailed ec2 hotplug tests assert hotplug specific behavior in more complex hotplug scenarios which is really above and beyond the bug we are checking for here. But, since those tests were easy to trigger, I added that to coverage matrix for validation. Sorry for the suspect additional data/SKIPs which prompted your investigation. But, good question too.
Thank you!

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Thanks for the clarification, Chad.

Ok, cool.

From that list of 5 tests, in ec2-noble-hotplug.log, 4/5 PASSED and 1/5 SKIPPED (test_multi_nic_hotplug_vpc), but as you clarified that 'test_hotplug_enabled_by_cmd' alone is sufficient, that is all good and even better than just that test.

Releasing with an aging exception (4 out of 7 days in -proposed) based on your request and justification, with the comprehensive test suite and clarification.

Appreciate the extensive coverage in testing and attention to detail in SRU bug template and verification.

cheers,
Mauricio

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 24.3.1-0ubuntu0~24.04.2

---------------
cloud-init (24.3.1-0ubuntu0~24.04.2) noble; urgency=medium

  * Bug fix release (LP: #2081124):
    d/p/cpick-hotplugd-systemd-ordering-fix.patch: fix systemd ordering cycle
    issues with network cloud-init-hotplugd.socket, NetworkManager and
    dbus.socket by adding DefaultDependencies=no to cloud-init-hotplugd.socket.

 -- Chad Smith <email address hidden> Fri, 20 Sep 2024 16:07:14 -0600

Changed in cloud-init (Ubuntu Noble):
status: Fix Committed → Fix Released
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote : Update Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Chad Smith (chad.smith)
Changed in oem-priority:
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.