Comment 2 for bug 1847583

Revision history for this message
Richard Maynard (richard-maynard) wrote :

This is the same problem we're having that I referenced in a different bug.

We had the same problem in both GCP and AWS. One of my early attempts to work-around the issue was apt hold the cloud-init package to an older version. When that failed I thought our problem was not cloud-init related, but turns out that it doesn't matter if I hold back the version cloud init was still using the latest release when handling the networking.

I've attached our cloud-init.log.

We ended up implementing the work around from a different bug report, disabling cloud init's networking configuration and writing our own netplan configuration. This allowed us to continue with new image builds.

Our image build pipeline is driven by CI, and generally does this:
- packer creates image based on a specific ubuntu-bionic image
- packer uses a combination of exec + salt provisioners
- packer copies the image to all regions and accounts where we deploy
- terraform is run updating launch templates to use the new AMI + apply user data
- other tooling cycles nodes through the autoscale group causing them to upgrade

I don't know if the fact that we use packer, build the image, then build again from that image is relevant, but it seems like it might be related to why the packer images using these AMI's with the default cloud-init work, but the images based off of these do not.