Comment 22 for bug 1611074

Revision history for this message
Scott Moser (smoser) wrote :

Paul,
A long winded comment, please stick with me. Please try to answer these
first:
Question 1.) Is there a way to definitive/declaritive way to determine
   that an instance has been resized? I'd hope for something kind of
   like an insnance id, like a "size-id". Basically, we need a way to
   determine if this event has occurred so that we can act on it.

You are right that it is cloud-config.service that is running this.
Steve Langasek helped me come to that realization also. I had originally
hoped that that too would be solved by this new cloud-init being present
in the "first" boot of an instance, but unfortunately that doesnt seem
right.

The following things were changed with commit 3705bb59 [1] that were
involved in the fix.
a.) we added x-systemd.requires=cloud-init.service to the mount options
    in /etc/fstab.
b.) we moved disk_setup and mounts from
      cloud_config_modules and running in cloud-config.service
    to
      cloud_init_modules and running in cloud_init_modules.service
c.) An azure specific bit of behavior adjusts disk_setup and mounts
    to run every boot (per-always) rather than the default behavior
    of per-instance. This is done specifically to catch this resize.

    It is done dynamically, and prior to cloud-init doing better caching
    to save work, it ended up getting run every instance.

    The result is that after upgrade and then resize, the disk_setup
    and mounts config modules still get run at cloud-config.service
    and thus lose the race with the systemd mounting of the device.

    The code here really takes a bunch of symantics to try to determine
    if a resize has occurred. Thus the question above.

Unfortunately, 'c' happens based on the presence of the ephemeral
disk at the time when the datasource first runs. That is racy with
the disks coming online. We need to find a better way to determine
when disks can be erased (and thus the disk_setup and mounts modules
can re-run). Note, it always was racy with the presence of the disks,
but because we're running earlier now we hit the race more.

I can't think of a solution that doesn't basically require waiting for
the disk to appear.