cloud-init does not respect declared MIME types in multipart archives

Bug #1888822 reported by Robert Van Voorhees
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Critical
Unassigned

Bug Description

In https://github.com/canonical/cloud-init/pull/290 we landed a change to user-data processing which expanded the set of MIME types we would consider signifying "unknown content" to include many (if not all) of the MIME types we would normally expect to be used in user-data multipart archives[0].

This means that every part is now assigned its MIME type based on the first line of its content; the declared MIME types are ignored.

In the specific reported case, a "text/cloud-boothook" part started with #!, which is appropriate and correct, but was therefore detected as "text/x-shellscript" due to this bug.

[0] Specifically, it was expanded to include all the values in the dict at https://github.com/canonical/cloud-init/blob/master/cloudinit/handlers/__init__.py#L43-L54

[Original Report]

In the upstream Kubernetes project Cluster API, specifically the Cluster API AWS Provider, it will download a file securely from AWS Secrets Manager in the cloud-init script, save that file to a well known location, and then restart the cloud-init service through systemd. After the cloud-init script is restarted, it will resolve the secrets file (that had previously not been there) and execute its commands.

This worked fine on versions of cloud-init up until 19.4-33-gbb4131a2-0ubuntu1~18.04.1. Once upgrading to 20.2-45-g5f7825e2-0ubuntu1~18.04.1 the secrets file is never resolved again.

Some other information:

- cloud-init is definitely successfully running twice based on systemd and cloud-init-output.
- The /var/lib/cloud/instance/user-data.txt does show the reference to the well-known file at /etc/secret-userdata.txt
- The "resolved" version of user-data at /var/lib/cloud/instance/user-data.txt.i does not include the resolved file. Deleting this file and then restarted cloud-init does not solve the problem, as the file resolves again without it.

Is there another command that is now required if you plan on restarting cloud-init for another execution where files are now present that were previously not?

1. Cloud Provider: AWS
2. Upstream issue: https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/1839 Instructions to recreate can be found in that issue including 2 public AMIs.

Revision history for this message
Robert Van Voorhees (rcvanvo) wrote :
Revision history for this message
Robert Van Voorhees (rcvanvo) wrote :

user-data.txt

Revision history for this message
Robert Van Voorhees (rcvanvo) wrote :

resolved user-data missing the secrets file.

Revision history for this message
Robert Van Voorhees (rcvanvo) wrote :

After running cloud-init clean cloud-init will hang when run again.

Revision history for this message
detiber (detiber) wrote :

I believe I've tracked the issue down to the following PR: https://github.com/canonical/cloud-init/pull/290

It looks like because we are declaring the boothook using only the content type, the content type is being overridden with x-shellscript because of the following code: https://github.com/canonical/cloud-init/blob/e1e54d2e2f9b4529276a89fa0a35e76f9964ca2a/cloudinit/user_data.py#L129-L133

I don't believe this behavior is correct since it is overriding correctly set content types with different content types (in this case overriding text/cloud-boothook with text/x-shellscript).

Revision history for this message
Ryan Harper (raharper) wrote :

Do you have a collect-logs from a successful run on 19.4 ? The logs included have two days (2020-07-23 and 2020-07-24, the former using 19.4 and the latter using 20.1).

Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Dan Watkins (oddbloke) wrote :

Hi Robert, detiber,

Thanks for using cloud-init, for filing a bug, and for the triage! Ryan and I have been chatting on IRC (feel free to join us in #cloud-init on Freenode) and we agree this is a regression. Apologies!

Some older platforms always pass user-data in MIME multipart archives which use "text/x-shellscript" for the part (even if the user is passing "#cloud-config" user-data). The commit you've identified mistakenly means that for every part with a MIME type we know about, we will use the first line of that parts content to determine its type, ignoring the MIME type. The first line of your boothook is "#!", which maps to x-shellscript. This in turn means that it runs later in boot, and everything else falls apart as a result.

The initial fix we've identified is to only use the content to determine the true MIME type if the given MIME type is x-shellscript. This relies on the fact that if an x-shellscript part does not start with a #!, then cloud-init will fail to execute it; it follows that every currently-functional x-shellscript MIME part starts with #!. This means that we will always detect true x-shellscript parts as x-shellscript from their content. And it follows, in turn, that we can safely _always_ use the content of x-shellscript parts to determine their type.

(The reason we cannot do the same for other MIME types is because they do not have the same "detection roundtrip" guarantee.)

In the meantime, if you modify your generated boothook to start with "#cloud-boothook", it will be correctly detected and handled.

Thanks, and apologies, again!

Dan

Changed in cloud-init:
status: Incomplete → Triaged
importance: Undecided → Critical
Dan Watkins (oddbloke)
summary: - cloud-init caches files and never checks again
+ cloud-init does not respect declared MIME types in multipart archives
Dan Watkins (oddbloke)
description: updated
Revision history for this message
Ryan Harper (raharper) wrote :
Revision history for this message
Robert Van Voorhees (rcvanvo) wrote :

Are there next steps or anything that could happen to address this PR?

Revision history for this message
Ryan Harper (raharper) wrote :

Robert,

Thanks for following up. The PR is waiting on a maintainer review to approve for landing.

Changed in cloud-init:
status: Triaged → In Progress
Revision history for this message
James Falcon (falcojr) wrote : Fixed in cloud-init version 20.3.

This bug is believed to be fixed in cloud-init in version 20.3. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: In Progress → Fix Released
Revision history for this message
Naadir Jeewa (randomvariable) wrote :

Hi there,

I think this might be broken again with 20.3, or at least we added the recommended workaround with #cloud-boothook, and machines with 20.3 don't execute it anymore.

Revision history for this message
Naadir Jeewa (randomvariable) wrote :

Actually, found out we need to set ERROR_ON_USER_DATA_FAILURE=False

Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.