Comment 2 for bug 1810859

Revision history for this message
Ryan Harper (raharper) wrote :

The goal of ds-identify is ensure that cloud-init does not run unless a datasource is present. We don't yet have the idea of "present" being much later than when the generators run. Note that generators may run multiple times, including in the initramfs and as soon as root device is found.

Cloud-init generally expresses it's dependencies to run a proper check, including presence of require paths (/var/lib/cloud, for example), but is unaware of more complicated storage configurations which may contain datasources (both the LVM and "delayed/slow" block device which is not yet present by the time that rootfs is present).

In the short term, I would suggest that for these scenarios, you will want a different ds-identify configuration than the default which is (in kernel cmd line format):

ci.di.policy=search,found=all,maybe=none,notfound=disabled

Alternatively, I think the two scenarios here could use this policy which
will put cloud-init in a mode where _if_ it finds a definitive datasource,
then it will use that specific datasource, however if none are found, then
cloud-init will remain enabled and then will search through all known datasources
when the services run.

ci.di.polilcy=search,found=all,maybe=all,notfound=enabled

Alternatively, if you have an image which you know uses a specific datasource and you
don't plan to export the image to a different platform (with different datasource)
you could specify the datasource on the kernel command line which would ensure that
cloud-init enables itself.

ci.datasource=nocloud-net

Note, that it's still possible for the race to occur in any of these scenarios...

how long should cloud-init wait during boot for it's datasource to arrive? It depends
on the use-case.

Looking at a longer term approach; I suspect that cloud-init, as a daemon, could remain
idle/inert and watch for specific system events on systemd-bus or dbus, or netlink, etc
and re-run it's identification code, and then trigger cloud-init services.

It remains to be discussed what it means for a "late" cloud-init, or a "at runtime" cloud-init
where an machine may be up and running without cloud-init having completed any tasks only to
then trigger those things at some time later.