status: races with dbus and errors out

Bug #2046483 reported by Alberto Contreras
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Undecided
Unassigned
Lunar
Won't Fix
Undecided
Unassigned
Mantic
Fix Released
Undecided
Unassigned
Noble
Fix Released
Undecided
Unassigned

Bug Description

=== Begin SRU Template ===
[Impact]
`cloud-init status` checks systemctl to ensure the reported status is accurate. However, systemctl can fail if dbus isn't yet ready, and those exceptions are not handled in cloud-init.

[Test Case]
Note that cloud-init developers have been unable to reproduce the problem. To simulate the issue, create a script that errors anytime "systemctl show" is used, but passes through to the real "systemctl" for any other commands used by cloud-init. Example script (here the real "systemctl" has been renamed to "systemctl2"):

#!/bin/bash

# Check if the first argument is 'show'
if [ "$1" == "show" ]; then
    echo "'show' not allowed"
    exit 1
fi

/usr/bin/systemctl2 "$@"

Then, with this script being used:
1. Ensure `cloud-init status --wait` exits successfully when cloud-init has finished running with no errors.
2. Edit "/run/cloud-init/status.json". Change the "finished" time of "modules-final" to be "null". Ensure "cloud-init status --wait" blocks while printing dots on the CLI.
3. Edit "/run/cloud-init/status.json". Change the "finished" time of "modules-final" to be "null". Add an arbitrary string to the "errors" list of "modules-final". Ensure "cloud-init status --wait" blocks but does not print dots. Replace the hand-made "systemctl" script with the real systemctl binary. Ensure the "cloud-init status --wait" call made earlier now starts printing dots.

[Regression Potential]
Given that this relates only to the "cloud-init status" command, the regression potential is fairly limited. It's possible a script blocking on "cloud-init status --wait" could exit early or that a "cloud-init status" command will run forever. Either case wouldn't be more than an inconvenience that can be easily worked around.

[Other Info]
Upstream bug: https://github.com/canonical/cloud-init/issues/4676
Upstream commit: https://github.com/canonical/cloud-init/commit/d29b744e742d12e41e9490fb05e74537b4b768d7

James Falcon (falcojr)
description: updated
description: updated
Changed in cloud-init (Ubuntu Lunar):
status: New → Won't Fix
Changed in cloud-init (Ubuntu):
status: Fix Committed → Fix Released
Changed in cloud-init (Ubuntu Focal):
status: New → In Progress
Changed in cloud-init (Ubuntu Jammy):
status: New → In Progress
Changed in cloud-init (Ubuntu Mantic):
status: New → In Progress
Revision history for this message
Alberto Contreras (aciba) wrote :

Published to Ubuntu Noble: cloud-init 24.1~3gb729a4c4-0ubuntu1 (Accepted)

Revision history for this message
Chad Smith (chad.smith) wrote :

Reopening this bug as we've now seen a second traceback that is now introduced by upstream commit https://github.com/canonical/cloud-init/commit/36b7f48d71 which also invokes systemctl early in boot and doesn't have the same retry mechanism around it.

The resulting tracebacks are something like

Traceback (most recent call last):
  File "/usr/bin/cloud-init", line 33, in <module>
    sys.exit(load_entry_point('cloud-init==23.4.2', 'console_scripts', 'cloud-init')())
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 1108, in main
    retval = util.log_time(
  File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2808, in log_time
    ret = func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/status.py", line 140, in handle_status_args
    details = get_status_details(paths, args.wait)
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/status.py", line 384, in get_status_details
    boot_status_code, description = get_bootstatus(
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/status.py", line 258, in get_bootstatus
    in subp.subp(["systemctl", "show-environment"]).stdout
  File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 322, in subp
    raise ProcessExecutionError(
cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
Command: ['systemctl', 'show-environment']
Exit code: 1
Reason: -
Stdout:
Stderr: Failed to connect to bus: Connection refused

tags: added: regression-proposed
Revision history for this message
Chad Smith (chad.smith) wrote :

Adding regression-proposed to this re-opened bug because we presumed/validated fixed for one call-site, but there is a second unrelated call-site interacting with systemctl early in boot that didn't perform a retry fallback on dbus connection errors.

Revision history for this message
Chad Smith (chad.smith) wrote :

Marking this bug as open against noble due to the second call-site that triggers the traceback in comment #2

Changed in cloud-init (Ubuntu):
status: Fix Released → Triaged
Revision history for this message
Chad Smith (chad.smith) wrote (last edit ):

Upstream PR fixing the secondary traceback https://github.com/canonical/cloud-init/pull/4842

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 24.1~5g1f6eddd5-0ubuntu1

---------------
cloud-init (24.1~5g1f6eddd5-0ubuntu1) noble; urgency=medium

  * Upstream snapshot based on upstream/main at 1f6eddd5.
    - Bugs fixed in this snapshot: (LP: #2046483)

 -- Chad Smith <email address hidden> Fri, 02 Feb 2024 14:39:53 -0700

Changed in cloud-init (Ubuntu Noble):
status: Triaged → Fix Released
Revision history for this message
Alberto Contreras (aciba) wrote :
tags: added: verification-failed-jammy verification-failed-mantic verification-needed-focal
tags: added: verification-failed-focal
removed: verification-needed-focal
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Note the backtrace in comment #2 shows that systemctl was called with "show-environment", and not just "show", so please take that into account in your test plan.

Changed in cloud-init (Ubuntu Mantic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-mantic
removed: verification-failed-mantic
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello Alberto, or anyone else affected,

Accepted cloud-init into mantic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/23.4.3-0ubuntu0~23.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-mantic to verification-done-mantic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-mantic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in cloud-init (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed-jammy
removed: verification-failed-jammy
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Hello Alberto, or anyone else affected,

Accepted cloud-init into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/23.4.3-0ubuntu0~22.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in cloud-init (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed-focal
removed: verification-failed-focal
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Hello Alberto, or anyone else affected,

Accepted cloud-init into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/23.4.3-0ubuntu0~20.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Alberto Contreras (aciba) wrote :

Verification failed for 23.4.3-0ubuntu0~20.04.1, 23.4.3-0ubuntu0~22.04.1 and 23.4.3-0ubuntu0~23.10.1.

`cloud-init status --wait` exists with 1 sometimes but not always, see attached logs.

tags: added: verification-failed-focal verification-failed-jammy verification-failed-mantic
removed: verification-needed verification-needed-focal verification-needed-jammy verification-needed-mantic
Revision history for this message
Alberto Contreras (aciba) wrote (last edit ):

My previous tests was wrong, cloud-init status --wait was being executed before the instance was rebooted, so that's why it failed. Retesting.

tags: added: verification-needed verification-needed-focal verification-needed-jammy verification-needed-mantic
removed: verification-failed-focal verification-failed-jammy verification-failed-mantic
Revision history for this message
Alberto Contreras (aciba) wrote :

I have verified this bug is not reproducible with versions: 23.4.3-0ubuntu0~20.04.1, 23.4.3-0ubuntu0~22.04.1, 23.4.3-0ubuntu0~23.10.1, executing the following tests:

--- Test 1 - happy path
Logs attached.

The following tests have been performed manually due to the race-nature of them.
--- Test 2
# Launch instance with cloud-init 23.4.3.

# Wait until cloud-init finishes

# cp /usr/bin/systemctl /usr/bin/systemctl2

# cat <<EOF >/usr/bin/systemctl
#!/bin/bash

# Check if the first argument is 'show'
if [ "$1" == "show" ]; then
    echo "'show' not allowed"
    exit 1
fi

/usr/bin/systemctl2 "$@"
EOF

# Edit "/run/cloud-init/status.json". Change the "finished" time of "modules-final" to be "null". "cloud-init status --wait" blocks while printing dots on the CLI.
# Edit "/run/cloud-init/status.json". Change the "finished" time of "modules-final" to be "null". Add an arbitrary string to the "errors" list of "modules-final". "cloud-init status --wait" blocks but does not print dots.
# Replace the hand-made "systemctl" script with the real systemctl binary. "cloud-init status --wait" call made earlier now starts printing dots.

--- Test 3
# Launch instance with cloud-init 23.4.3.

# Wait until cloud-init finishes

# cp /usr/bin/systemctl /usr/bin/systemctl2

# cat <<EOF >/usr/bin/systemctl
#!/bin/bash

# Check if the first argument is 'show'
if [ "$1" == "show" ]; then
    echo "'show' not allowed"
    exit 1
fi
if [ "$1" == "show-environment" ]; then
    echo "'show-environment' not allowed"
    exit 1
fi
/usr/bin/systemctl2 "$@"
EOF

# "cloud-init status --wait" blocks without printing dots. This is expected behavior, as cloud-init needs to know if the boot status (if it was disabled or not) in order to print dots or do something.

James Falcon (falcojr)
tags: added: verification-done verification-done-focal verification-done-jammy verification-done-mantic
removed: verification-needed verification-needed-focal verification-needed-jammy verification-needed-mantic
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 23.4.3-0ubuntu0~23.10.1

---------------
cloud-init (23.4.3-0ubuntu0~23.10.1) mantic; urgency=medium

  * Upstream snapshot based on 23.4.3. (LP: #2046483).
    List of changes from upstream can be found at
    https://raw.githubusercontent.com/canonical/cloud-init/23.4.3/ChangeLog

cloud-init (23.4.2-0ubuntu0~23.10.1) mantic; urgency=medium

  * Upstream snapshot based on 23.4.2. (LP: #2045582).
    List of changes from upstream can be found at
    https://raw.githubusercontent.com/canonical/cloud-init/23.4.2/ChangeLog
    - Bugs fixed in this snapshot: (LP: #2051147)

cloud-init (23.4.1-0ubuntu1~23.10.2) mantic; urgency=medium

  * d/p/status-retain-recoverable-error-exit-code.patch:
    Retain exit code in cloud-init status for recoverable errors.
    (LP: #2048522).

cloud-init (23.4.1-0ubuntu1~23.10.1) mantic; urgency=medium

  * d/p/retain-apt-pre-deb822.patch:
    - Disable apt source list generation with DEB822 style
  * d/p/do-not-block-user-login.patch:
    - revert redacted patch content introduced in 23.4-0
  * refresh patches:
    - d/p/status-do-not-remove-duplicated-data.patch
  * d/changelog: amend 23.4-0 refresh patches and dropped cherry-picks entry
  * Upstream snapshot based on 23.4.1. (LP: #2045582).
    List of changes from upstream can be found at
    https://raw.githubusercontent.com/canonical/cloud-init/23.4.1/ChangeLog

cloud-init (23.4-0ubuntu1~23.10.1) mantic; urgency=medium

  * d/p/status-do-not-remove-duplicated-data.patch:
    - Revert behavior downstream, leave duplicate data
  * d/control: add python3-apt as Recommends to read APT config from apt_pkg
  * refresh patches:
    - d/p/do-not-block-user-login.patch
  * drop the following cherry-picks now included:
    - cpick-0d9f149a-Pytestify-apt-config-test-modules-4424
    - cpick-5023e9f9-Refactor-test_apt_source_v1.py-to-use-pytest-4427
    - cpick-e9cdd7e3-Install-gnupg-if-gpg-not-found-4431
    - cpick-015543d3-apt-install-software-properties-common-when-absent-but
    - cpick-2ab1f340-fix-cc_apt_configure-avoid-unneeded-call-to-apt-install
  * Upstream snapshot based on 23.4. (LP: #2045582).
    List of changes from upstream can be found at
    https://raw.githubusercontent.com/canonical/cloud-init/23.4/ChangeLog

 -- James Falcon <email address hidden> Fri, 02 Feb 2024 16:00:04 -0600

Changed in cloud-init (Ubuntu Mantic):
status: Fix Committed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Update Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 23.4.3-0ubuntu0~22.04.1

---------------
cloud-init (23.4.3-0ubuntu0~22.04.1) jammy; urgency=medium

  * Upstream snapshot based on 23.4.3. (LP: #2046483).
    List of changes from upstream can be found at
    https://raw.githubusercontent.com/canonical/cloud-init/23.4.3/ChangeLog

cloud-init (23.4.2-0ubuntu0~22.04.1) jammy; urgency=medium

  * Upstream snapshot based on 23.4.2. (LP: #2045582).
    List of changes from upstream can be found at
    https://raw.githubusercontent.com/canonical/cloud-init/23.4.2/ChangeLog
    - Bugs fixed in this snapshot: (LP: #2051147)

cloud-init (23.4.1-0ubuntu1~22.04.2) jammy; urgency=medium

  * d/p/status-retain-recoverable-error-exit-code.patch:
    Retain exit code in cloud-init status for recoverable errors.
    (LP: #2048522).

cloud-init (23.4.1-0ubuntu1~22.04.1) jammy; urgency=medium

  * d/p/retain-apt-pre-deb822.patch:
    - Disable apt source list generation with DEB822 style
  * refresh patches:
    - d/p/status-do-not-remove-duplicated-data.patch
  * d/changelog: amend 23.4-0 refresh patches entry
  * Upstream snapshot based on 23.4.1. (LP: #2045582).
    List of changes from upstream can be found at
    https://raw.githubusercontent.com/canonical/cloud-init/23.4.1/ChangeLog

cloud-init (23.4-0ubuntu1~22.04.1) jammy; urgency=medium

  * d/control: add python3-apt as Recommends to read APT config from apt_pkg
  * d/p/status-do-not-remove-duplicated-data.patch:
    - Revert behavior downstream, leave duplicate data
  * d/p/series: bring back retain-old-groups.patch.
    This patch was inadvertently dropped in 5d4a3cf.
  * refresh patches:
    - d/p/do-not-block-user-login.patch
  * Upstream snapshot based on 23.4. (LP: #2045582).
    List of changes from upstream can be found at
    https://raw.githubusercontent.com/canonical/cloud-init/23.4/ChangeLog

 -- James Falcon <email address hidden> Fri, 02 Feb 2024 15:59:14 -0600

Changed in cloud-init (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 23.4.3-0ubuntu0~20.04.1

---------------
cloud-init (23.4.3-0ubuntu0~20.04.1) focal; urgency=medium

  * Upstream snapshot based on 23.4.3. (LP: #2046483).
    List of changes from upstream can be found at
    https://raw.githubusercontent.com/canonical/cloud-init/23.4.3/ChangeLog

cloud-init (23.4.2-0ubuntu0~20.04.1) focal; urgency=medium

  * Upstream snapshot based on 23.4.2. (LP: #2045582).
    List of changes from upstream can be found at
    https://raw.githubusercontent.com/canonical/cloud-init/23.4.2/ChangeLog
    - Bugs fixed in this snapshot: (LP: #2051147)

cloud-init (23.4.1-0ubuntu1~20.04.2) focal; urgency=medium

  * d/p/status-retain-recoverable-error-exit-code.patch:
    Retain exit code in cloud-init status for recoverable errors.
    (LP: #2048522).

cloud-init (23.4.1-0ubuntu1~20.04.1) focal; urgency=medium

  * d/p/retain-apt-pre-deb822.patch:
    - Disable apt source list generation with DEB822 style
  * refresh patches:
    - d/p/status-do-not-remove-duplicated-data.patch
  * d/changelog: amend 23.4-0 refresh patches entry
  * Upstream snapshot based on 23.4.1. (LP: #2045582).
    List of changes from upstream can be found at
    https://raw.githubusercontent.com/canonical/cloud-init/23.4.1/ChangeLog

cloud-init (23.4-0ubuntu1~20.04.1) focal; urgency=medium

  * d/p/status-do-not-remove-duplicated-data.patch:
    - Revert behavior downstream, leave duplicate data
  * d/control: add python3-apt as Recommends to read APT config from apt_pkg
  * refresh patches:
    - d/p/do-not-block-user-login.patch
    - d/p/netplan99-cannot-use-default.patch
  * Upstream snapshot based on 23.4. (LP: #2045582).
    List of changes from upstream can be found at
    https://raw.githubusercontent.com/canonical/cloud-init/23.4/ChangeLog

 -- James Falcon <email address hidden> Fri, 02 Feb 2024 16:00:45 -0600

Changed in cloud-init (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.