Paul,
A long winded comment, please stick with me. Please try to answer these
first:
Question 1.) Is there a way to definitive/declaritive way to determine
that an instance has been resized? I'd hope for something kind of
like an insnance id, like a "size-id". Basically, we need a way to
determine if this event has occurred so that we can act on it.
You are right that it is cloud-config.service that is running this.
Steve Langasek helped me come to that realization also. I had originally
hoped that that too would be solved by this new cloud-init being present
in the "first" boot of an instance, but unfortunately that doesnt seem
right.
The following things were changed with commit 3705bb59 [1] that were
involved in the fix.
a.) we added x-systemd.requires=cloud-init.service to the mount options
in /etc/fstab.
b.) we moved disk_setup and mounts from
cloud_config_modules and running in cloud-config.service
to
cloud_init_modules and running in cloud_init_modules.service
c.) An azure specific bit of behavior adjusts disk_setup and mounts
to run every boot (per-always) rather than the default behavior
of per-instance. This is done specifically to catch this resize.
It is done dynamically, and prior to cloud-init doing better caching
to save work, it ended up getting run every instance.
The result is that after upgrade and then resize, the disk_setup
and mounts config modules still get run at cloud-config.service
and thus lose the race with the systemd mounting of the device.
The code here really takes a bunch of symantics to try to determine
if a resize has occurred. Thus the question above.
Unfortunately, 'c' happens based on the presence of the ephemeral
disk at the time when the datasource first runs. That is racy with
the disks coming online. We need to find a better way to determine
when disks can be erased (and thus the disk_setup and mounts modules
can re-run). Note, it always was racy with the presence of the disks,
but because we're running earlier now we hit the race more.
I can't think of a solution that doesn't basically require waiting for
the disk to appear.
Paul, declaritive way to determine
A long winded comment, please stick with me. Please try to answer these
first:
Question 1.) Is there a way to definitive/
that an instance has been resized? I'd hope for something kind of
like an insnance id, like a "size-id". Basically, we need a way to
determine if this event has occurred so that we can act on it.
You are right that it is cloud-config. service that is running this.
Steve Langasek helped me come to that realization also. I had originally
hoped that that too would be solved by this new cloud-init being present
in the "first" boot of an instance, but unfortunately that doesnt seem
right.
The following things were changed with commit 3705bb59 [1] that were requires= cloud-init. service to the mount options config_ modules and running in cloud-config. service init_modules and running in cloud_init_ modules. service
involved in the fix.
a.) we added x-systemd.
in /etc/fstab.
b.) we moved disk_setup and mounts from
cloud_
to
cloud_
c.) An azure specific bit of behavior adjusts disk_setup and mounts
to run every boot (per-always) rather than the default behavior
of per-instance. This is done specifically to catch this resize.
It is done dynamically, and prior to cloud-init doing better caching
to save work, it ended up getting run every instance.
The result is that after upgrade and then resize, the disk_setup service
and mounts config modules still get run at cloud-config.
and thus lose the race with the systemd mounting of the device.
The code here really takes a bunch of symantics to try to determine
if a resize has occurred. Thus the question above.
Unfortunately, 'c' happens based on the presence of the ephemeral
disk at the time when the datasource first runs. That is racy with
the disks coming online. We need to find a better way to determine
when disks can be erased (and thus the disk_setup and mounts modules
can re-run). Note, it always was racy with the presence of the disks,
but because we're running earlier now we hit the race more.
I can't think of a solution that doesn't basically require waiting for
the disk to appear.