`cloud-init status` should distinguish between "permanently disabled" and "disabled for this boot"

Bug #1883122 reported by Dan Watkins
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Medium
Chad Smith

Bug Description

Using ds-identify and a systemd generator, cloud-init can detect that it should disable itself for a particular boot when there is nothing for it to do. However, if on the next boot a datasource becomes applicable (e.g. a NoCloud/ConfigDrive device is presented to the system) then cloud-init will _not_ be disabled, because ds-identify will detect an applicable datasource.

If users want a stronger guarantee that cloud-init will not run, then they can touch /etc/cloud/cloud-init.disabled, or add cloud-init=disabled to their grub configured kernel command line. When they do so, cloud-init will _never_ run, regardless of the applicability of datasources.

In both of these cases, `cloud-init status` reports "disabled". This means that users who want to confirm that cloud-init will never run in the future given its current configuration have to check all the potential ways that cloud-init might be permanently disabled (/etc/..., kernel cmdline, maybe other options that I haven't documented here, maybe new options in the future) themselves.

We should distinguish between these two modalities of "disabled" for users in our status output.

Revision history for this message
Dan Watkins (oddbloke) wrote :

This is related to bug 1883124.

Revision history for this message
Dan Watkins (oddbloke) wrote :

I've thought about this a little more, and I think our current status output might be conflating two different concepts into a single field. One is "what is cloud-init's status on this particular boot", and one is "what is cloud-init's status on this instance across boots".

This means that we cannot express "cloud-init did not run on this boot but could in future" because we can only express "disabled".

This conflation is also what leads to bug 1883124: we have no way of expressing "cloud-init ran this boot but will not run again", because "done" and "disabled" are mutually exclusive.

If we had two fields instead of one; one could be "this boot status" and would have all the values that we currently display, and one would be either "enabled" or "disabled"[0]. (I haven't thought of a pithy name for this latter field yet.)

What do people think?

[0] I would favour using string values for this field instead of making it a "permanently disabled?" boolean, because it allows us to introduce more granular/different cross-boot states in future without having to modify the structure of the output again.

Revision history for this message
James Falcon (falcojr) wrote :

I like the idea of two different fields. A "this boot" status, and a "will boot" field. Is "status" and "is-enabled" pithy enough? Just stealing from systemd nomenclature here :)

Revision history for this message
Chad Smith (chad.smith) wrote :

There's also `cloud-init status --long` which should have reported to folks the reason for disabling such as 'Cloud-init disabled by /etc/cloud/cloud-init.disabled'

I think there is also a bug here in that behavior because currently on a disabled container which has already run cloud-init once, we report both the detected datasource details and the disabled state.

$ cloud-init status --long
status: disabled
time: Fri, 12 Jun 2020 20:00:51 +0000
detail:
DataSourceNoCloud [seed=/var/lib/cloud/seed/nocloud-net][dsmode=net]

I think we can definitely provide a better assessment of the machine state and separation of current_status versus next_boot status, or is_enabled as James mentioned.

Revision history for this message
Ryan Harper (raharper) wrote :

As Chad mentions, status --long shows this state:

% touch /etc/cloud/cloud-init.disabled
% reboot
% cloud-init status --long
status: disabled
detail:
Cloud-init disabled by /etc/cloud/cloud-init.disabled

root@g1:~# rm -rf /etc/cloud/cloud-init.disabled
root@g1:~# mv /var/lib/cloud/seed/nocloud-net /root/
root@g1:~# reboot
root@g1:~# ..:(╯°□°)╯彡┻━━┻
(crispyboi) ~ % lxc exec g1 bash
root@g1:~# cloud-init status --long
status: disabled
detail:
Cloud-init disabled by cloud-init-generator

So, this feels like enough to me... But maybe we should incorporate some additional message to convey that the former test (cloud-init.disabled) means future boots will not run cloud-init at all and the latter means cloud-init will run detection code.

Revision history for this message
Dan Watkins (oddbloke) wrote :

Aha, OK, one part I missed: we have a specific use case which wants these distinct "disabled" states to be machine readable. From my perspective, the current output doesn't meet that requirement for a couple of reasons: (a) for me, the detail text is intended for human consumption (consider how you would have to parse that reason out: `disable_reason = detail.split("disabled by")[1]`) but, probably more importantly, (b) this still requires consumers to enumerate the ways in which cloud-init can be permanently disabled, which will fall down if we add a new method by which cloud-init can be disabled.

Revision history for this message
Ryan Harper (raharper) wrote :

So, I suggest we look at update the status.json or result.json files.

cloud-init status is as it has been for sometime, I'd avoid modifying it's output.

It's always reflected the *current* state of the system as it just booted. It's never attempted to provide status on the next boot, or whether the system is currently disabled or not.

For a new use-case of "machine readable" and distinguishing between current boot and next boot; I think additional work needs to be done on understanding what "next boot" status means; we cannot predict:

1) changes to kernel command dline
2) add/removes of additional devices (cdrom/usb)
3) changes to the underlying platform (image capture and boot on new provider)

I generally think the "next boot" status to not be reliable. Could we expand a bit more on the new requirement?

Revision history for this message
Dan Watkins (oddbloke) wrote :

> So, I suggest we look at update the status.json or result.json files.

Yep, agreed that's a reasonable path forward.

> I generally think the "next boot" status to not be reliable. Could we expand a bit more on the new requirement?

Yep, let me come back with some further details.

Revision history for this message
Chad Smith (chad.smith) wrote :

Marking this incomplete just to nudge it in expectation of getting more details :)

Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for cloud-init because there has been no activity for 60 days.]

Changed in cloud-init:
status: Incomplete → Expired
Revision history for this message
Ian Johnson (anonymouse67) wrote :

What we would like to be able to do in snapd is positively distinguish between the following cases:

1) a cloud-init.disabled file was placed, thus disabling the generator and ensuring that cloud-init will never run "by itself" if the device was rebooted at that point in time with the same disk and image and everything
2) no cloud-init file was placed, and cloud-init has not been "triggered" on this boot, but it is not "fully disabled", and if a CDROM or USB drive with NoCloud stuff was attached on next boot, it could still run

We currently handle this in snapd by just checking first if the cloud-init.disabled file exists, and if it doesn't but `cloud-init status` still says "disabled" we consider that to be "untriggered".

Changed in cloud-init:
status: Expired → New
James Falcon (falcojr)
Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Ian Johnson (anonymouse67) wrote :

Why was this bug set to incomplete? I provided a use case for this bug in my previous comment

Revision history for this message
Chad Smith (chad.smith) wrote :

Oops I think this status may have been incorrectly set on this bug. Thanks Ian for coming back here with the question.

 I think we all agree that there is a deficit in the status output and cloud-init status needs to provide a better machine-readable format or field that would provide enough information.

We haven't gotten back around to defining this structured output.

I think this is a useful feature request that we need to address and allocate time for.

Changed in cloud-init:
status: Incomplete → Triaged
importance: Undecided → Medium
Revision history for this message
Richard Harding (rharding) wrote :

Part of this comes into play from a previous bug that this can help provide a better experience to users.

Just noting for background.

https://bugs.launchpad.net/snapd/+bug/1879530

Chad Smith (chad.smith)
Changed in cloud-init:
status: Triaged → In Progress
assignee: nobody → Chad Smith (chad.smith)
Revision history for this message
Chad Smith (chad.smith) wrote :

As part of an effort to surface more detailed/machine-readable content that snappy and subiquity can rely on, we are now looking to add `cloud-init status --format=json` or yaml output which now surfaces a boot_status_code key which can be one of the following:

     - 'unknown': systemd generators and ds-identify haven't run yet to
                  determine if cloud-init should be run during this boot
     - 'disabled-by-marker-file': /etc/cloud/cloud-init.disabled exists
                  which prevents cloud-init from ever running
     - 'disabled-by-generator': systemd generator ran ds-identify and
                  determined no applicable cloud-init datasources
     - 'disabled-by-kernel-cmdline': kernel cmdline contained
                  cloud-init=disabled
     - 'enabled-by-kernel-cmdline': kernel cmdline contained
                  cloud-init=enabled
     - 'enabled-by-generator': ds-identify detected possible cloud-init
                  datasources
     - 'enabled-by-sysvinit': enabled by default in SysV init environment

See upstream PR for details: https://github.com/canonical/cloud-init/pull/1663

Revision history for this message
James Falcon (falcojr) wrote : Fixed in cloud-init version 22.4.

This bug is believed to be fixed in cloud-init in version 22.4. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: In Progress → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.