Drop udev remove action in cloud-init-hotplugd

Bug #2107301 reported by Chengen Du
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Invalid
Undecided
Chengen Du
Focal
Triaged
Undecided
Chengen Du
Jammy
Triaged
Undecided
Chengen Du
Noble
Triaged
Undecided
Chengen Du
Oracular
Won't Fix
Undecided
Chengen Du
Plucky
Invalid
Undecided
Chengen Du

Bug Description

[Impact]
When `modprobe --remove ena` is executed, the kernel triggers a udev remove event.
This causes cloud-init to refetch the datasource information, expecting the NIC to be gone.
However, since IMDS updates asynchronously, cloud-init's hotplug logic may wait and retry if the NIC still appears to be present.
The process will end up showing the following call trace:

2025-03-21 19:38:43,116 - hotplug_hook.py[ERROR]: Received fatal exception handling hotplug!
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/devel/hotplug_hook.py", line 334, in handle_args
    handle_hotplug(
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/devel/hotplug_hook.py", line 224, in handle_hotplug
    try_hotplug(subsystem, event_handler, datasource)
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/devel/hotplug_hook.py", line 257, in try_hotplug
    raise last_exception
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/devel/hotplug_hook.py", line 246, in try_hotplug
    event_handler.detect_hotplugged_device()
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/devel/hotplug_hook.py", line 112, in detect_hotplugged_device
    raise RuntimeError(
RuntimeError: Failed to detect 02:24:50:39:e7:ef in updated metadata
Traceback (most recent call last):
  File "/usr/bin/cloud-init", line 33, in <module>
    sys.exit(load_entry_point('cloud-init==24.4.1', 'console_scripts', 'cloud-init')())
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 1273, in main
    return sub_main(args)
            ^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 1394, in sub_main
    retval = functor(name, args)
              ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/devel/hotplug_hook.py", line 334, in handle_args
    handle_hotplug(
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/devel/hotplug_hook.py", line 224, in handle_hotplug
    try_hotplug(subsystem, event_handler, datasource)
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/devel/hotplug_hook.py", line 257, in try_hotplug
    raise last_exception
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/devel/hotplug_hook.py", line 246, in try_hotplug
    event_handler.detect_hotplugged_device()
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/devel/hotplug_hook.py", line 112, in detect_hotplugged_device
    raise RuntimeError(
RuntimeError: Failed to detect 02:24:50:39:e7:ef in updated metadata
cloud-init-hotplugd.service: Main process exited, code=exited, status=1/FAILURE
cloud-init-hotplugd.service: Failed with result 'exit-code'.
Failed to start cloud-init-hotplugd.service - Cloud-init: Hotplug Hook.

[Fix]
Monitoring the udev remove action may not be necessary.
Once the device is removed, its configurations become inactive, making explicit updates potentially redundant.

An upstream commit has dropped support for the udev remove action in cloud-init-hotplugd.

commit 3c2ff0ca7086c1350c6f2b57070481da514dbc36
Author: yukariatlas <email address hidden>
Date: Wed, 9 Apr 2025 05:19:07 +0800

    fix: drop udev remove action in hotplug (#6152)

    When `modprobe --remove ena` is executed, the kernel triggers a udev
    remove event. This causes cloud-init to refetch datasource information,
    expecting the NIC to be gone. However, since IMDS updates asynchronously,
    cloud-init's hotplug logic may wait and retry if the NIC still appears
    present. Monitoring the udev remove action may not be necessary. Once the
    device is removed, its configurations become inactive, and explicitly
    updating them might be redundant.

    Fixes: GH-5706

[Test Plan]
1. Launch an instance in AWS EC2.
2. Run `sudo modprobe -r ena && sudo modprobe ena` to verify that the call trace no longer appears.

[Where problems could occur]
The patch is based on the assumption that configurations become inactive once the device is removed.
Explicit cleanup is considered unnecessary, as subsequent udev add events will realign the configuration.
If there are any flaws in this assumption, the cloud-init hotplug mechanism may be affected.

Revision history for this message
Chengen Du (chengendu) wrote :

Debdiff for Focal

Revision history for this message
Chengen Du (chengendu) wrote :

Debdiff for Jammy

Revision history for this message
Chengen Du (chengendu) wrote :

Debdiff for Noble

Revision history for this message
Chengen Du (chengendu) wrote :

Debdiff for Oracular

Revision history for this message
Chengen Du (chengendu) wrote :

Debdiff for Plucky

Chengen Du (chengendu)
Changed in cloud-init (Ubuntu Focal):
assignee: nobody → Chengen Du (chengendu)
Changed in cloud-init (Ubuntu Jammy):
assignee: nobody → Chengen Du (chengendu)
Changed in cloud-init (Ubuntu Noble):
assignee: nobody → Chengen Du (chengendu)
Changed in cloud-init (Ubuntu Oracular):
assignee: nobody → Chengen Du (chengendu)
Changed in cloud-init (Ubuntu Plucky):
assignee: nobody → Chengen Du (chengendu)
Changed in cloud-init (Ubuntu Focal):
status: New → Triaged
Changed in cloud-init (Ubuntu Jammy):
status: New → Triaged
Changed in cloud-init (Ubuntu Noble):
status: New → Triaged
Changed in cloud-init (Ubuntu Oracular):
status: New → Triaged
Changed in cloud-init (Ubuntu Plucky):
status: New → Triaged
Revision history for this message
Chad Smith (chad.smith) wrote :

Thank you for filing this bug and improving cloud-init.

I'm confused about the intent of this tracking bug as debdiffs into stable releases are unnecessary as separate work efforts because cloud-init adheres to the SRU policy https://wiki.ubuntu.com/CloudinitUpdates where we SRU latest release of cloud-init 25.2 and all non-breaking changes for bug fixes into stable releases Focal and newer.

This means that the commit https://github.com/canonical/cloud-init/pull/6152 will be the upstream 25.2 release which will by SRU'd back to Focal after the release.

Changed in cloud-init (Ubuntu Plucky):
status: Triaged → Invalid
Revision history for this message
Chad Smith (chad.smith) wrote :

I'm marking this particular tracking bug as invalid as cloud-init doesn't typically reflect each individual bugfix from github into launchpad for fixes included in the upcoming SRU of 15.2. If either I have misunderstood this issue, or if this particular bug request warrants more attention and priority than awaiting the 15.2 SRU, then please feel free to reopen the ticket and provide justification to warrant an individual bug fix release prior to 15.2 which is scheduled for May 5th https://discourse.ubuntu.com/t/2025-cloud-init-release-schedule/55534. Expect that SRU of 25.2 will take place within 1 month after the upstream release May 5th.

Revision history for this message
Chengen Du (chengendu) wrote :

Thanks! I wasn't aware of this. If all stable releases will include the bug fix after the upstream release, then filing this bug may not be necessary. I really appreciate all your assistance with this.

Revision history for this message
Ural Tunaboyu (uralt) wrote :

Ubuntu 24.10 (Oracular Oriole) has reached end of life, so this bug will not be fixed for that specific release.

Changed in cloud-init (Ubuntu Oracular):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.