Jammy's fix to CD-ROM device in OVF breaks VMware datasource

Bug #1992509 reported by Andrew Kutz
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init
Expired
High
Unassigned

Bug Description

At some point Canonical updated the OVF used to produce Jammy's OVA at https://cloud-images.ubuntu.com/releases/jammy/release-20220808/ubuntu-22.04-server-cloudimg-amd64.ova from the OVF used to produce Impish's OVA at https://cloud-images.ubuntu.com/releases/impish/release-20220708/ubuntu-21.10-server-cloudimg-amd64.ova. However, this update routinely causes failures in the Cloud-Init Data Source for VMware at https://cloudinit.readthedocs.io/en/latest/topics/datasources/vmware.html.

Technically speaking the VMware data source should have routinely failed with Canonical's cloud OVA, but prior to Jammy there was a bug in the OVF -- the CD-ROM device was parented to the wrong IDE controller. The diff between the two OVFs is attached, but the important bit was here:

> - <rasd:Parent>3</rasd:Parent>
> + <rasd:Parent>5</rasd:Parent>

The CD-ROM device in Impish is parented to device ID 3, which is the following device:

> <Item>
> <rasd:Address>0</rasd:Address>
> <rasd:Description>SCSI Controller</rasd:Description>
> <rasd:ElementName>SCSI Controller 0</rasd:ElementName>
> <rasd:InstanceID>3</rasd:InstanceID>
> <rasd:ResourceSubType>VirtualSCSI</rasd:ResourceSubType>
> <rasd:ResourceType>6</rasd:ResourceType>
> </Item>

Except when parenting a CD-ROM device to a SCSI controller like this causes vSphere 7.x+ (and perhaps earlier versions -- I could not easily check) to quietly drop the CD-ROM device. However, on Jammy the parent is device ID 5, which is the following device:

> <Item>
> <rasd:Address>1</rasd:Address>
> <rasd:Description>IDE Controller</rasd:Description>
> <rasd:ElementName>VirtualIDEController 1</rasd:ElementName>
> <rasd:InstanceID>5</rasd:InstanceID>
> <rasd:ResourceType>5</rasd:ResourceType>
> </Item>

Parenting a CD-ROM to an IDE controller is fine, and deploying the Jammy OVA on vSphere 7.x+ results in a VM with a valid CD-ROM device. And that is the root of the problem. Because now if the VM also has one or more OVF properties defined, as the Ubuntu OVFs do, vSphere will send that information into the guest using one of two transports: ISO or GuestInfo (ESX --> VM RPC via VM Tools). Both Ubuntu OVFs indicate to use the ISO transport:

> <VirtualHardwareSection ovf:transport="iso">

The ISO transport logic in vSphere is as follows:

1. if there are one or more OVF properties
2. construct an ISO file with the OVF environment and its properties
3. upload the ISO to a datastore accessible by the ESXi host where the VM is scheduled
4. attach the ISO file to the VM's CD-ROM device

When the VM is powered on, VM Tools reads the contents of the ISO mounted to the CD-ROM device. However, the presence of the CD-ROM with valid OVF environment data *also* results in the Cloud-Init OVF data source (https://cloudinit.readthedocs.io/en/latest/topics/datasources/ovf.html) detecting that it should be loaded (https://github.com/canonical/cloud-init/blob/cd2cca35a1bf36b584422f431c3ddf55b820434c/tools/ds-identify#L984-L1030). Once that occurs the VMware data source is not loaded, even if there is valid data to do so (https://github.com/canonical/cloud-init/blob/cd2cca35a1bf36b584422f431c3ddf55b820434c/tools/ds-identify#L1459-L1499).

The fix to the CD-ROM device is itself not an issue -- it *should* have been fixed. In fact, if the CD-ROM was not fixed but the "ovf:transport" value had been changed to "guestInfo" then the same problem would have occurred as the OVF data source would have been triggered that way and taken precedence over the VMware data source (https://github.com/canonical/cloud-init/blob/cd2cca35a1bf36b584422f431c3ddf55b820434c/tools/ds-identify#L965-L982). However, while not strictly a problem in a vacuum, the change *does* regress the prior behavior of the stock, Ubuntu cloud images. All of a sudden the Jammy OVA, and presumably future images, will trigger the OVF data source on vSphere instead of the VMware data source when both were detected via ds_identify.

This is unfortunate, and I am not sure what can be done about it. The fix to the CD-ROM was necessary, even if it did break the VMware data source when OVF properties are defined in the OVF. What do y'all think should occur?

Revision history for this message
Andrew Kutz (akutz) wrote :
Revision history for this message
Andrew Kutz (akutz) wrote :

I failed to note that pre-Jammy, because the CD-ROM device was dropped on importing the OVA, the ISO transport would not mount the ISO to the VM (because it could not) and thus the OVF data source was not activated, allowing the VMware data source to function.

Revision history for this message
James Falcon (falcojr) wrote :

Would this be addressed at all by https://github.com/canonical/cloud-init/pull/1573 ?

Revision history for this message
Andrew Kutz (akutz) wrote :

Hi James,

No, because for now that PR retains the existing load order. I raised this in the PR as a comment -- should we invert the order, but Pengpeng made a good point about keeping the order the same in the merge to reduce deltas.

Revision history for this message
Andrew Kutz (akutz) wrote :

Hmm, I'm looking at the PR again, and it looks like Pengpeng moved the former OVF transport *behind* the GuestInfo transport. This is new to me. I pinged him for more info. If this is intentional then it would solve the issue.

Revision history for this message
James Falcon (falcojr) wrote :

I'm going to set this to incomplete for now until we know the next steps. If we decide this isn't addressed by #1573, we can set this back to new and triage it.

Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for cloud-init because there has been no activity for 60 days.]

Changed in cloud-init:
status: Incomplete → Expired
Revision history for this message
Pengpeng Sun (pengpengs) wrote :

Hi James and Andrew,

I'm reopening this one since I can see DS OVF is always loaded when deploying ubuntu-23.04-beta-cloudimg-amd64.ova onto vSphere. Cloud-init version is 23.1.1. As Andrew mentioned in the description, ovfEnv.xml constructed with OVF properties and put into an ISO file, the ISO file attached to VM's CD-ROM device.

PR 1573 (https://github.com/canonical/cloud-init/pull/1573) cannot resolve this issue since in datasource_list, DS OVF is ahead of DS VMware.
Before PR 1573 change, the sequence of transports was:
1. local seed_dir (OVF)
2. customization cfg (OVF)
3. guestinfo.ovfEnv (OVF)
4. ISO cdrom (OVF)
5. EnvVars (VMware)
6. GuestInfo (VMware)
After PR 1573 change, the sequence of transports is:
1. local seed_dir (OVF)
2. guestinfo.ovfEnv (OVF)
3. ISO cdrom (OVF)
4. EnvVars (VMware)
5. GuestInfo (VMware)
6. customization cfg (OVF)

I'm thinking of that can we move DS VMware to be in front of DS OVF in cloud-init datasource search list? Not sure if this has been discussed or not, any concerns?

Best regards,
Pengpeng

Changed in cloud-init:
status: Expired → New
Revision history for this message
Pengpeng Sun (pengpengs) wrote :

Correction:

After PR 1573 change, the sequence of transports is:
1. local seed_dir (OVF)
2. guestinfo.ovfEnv (OVF)
3. ISO cdrom (OVF)
4. EnvVars (VMware)
5. GuestInfo (VMware)
6. customization cfg (VMware)

James Falcon (falcojr)
Changed in cloud-init:
importance: Undecided → High
status: New → Triaged
Revision history for this message
James Falcon (falcojr) wrote :

I agree that it makes sense to move the VMWare DS to be in front of OVF.

In the case that rpctool doesn't exist or doesn't have valid metadata and userdata, the OVF datasource will work as expected.

I am worried about the case where there happens to be both an ovf xml along with the rpctool. In that case, the OVF datasource would have been detected and used, but after this change, the VMWare datasource will be detected and used. If that happens on an already installed system, it seems likely it could cause a breaking change if the ovf xml contains different information that rpctool.

Since we always backport new cloud-init releases to existing supported Ubuntu releases, is this an acceptable risk for VMWare? Would you only want this behavior for new releases, but keep existing behavior for existing Ubuntu releases?

Revision history for this message
Pengpeng Sun (pengpengs) wrote :

Before PR 1573 change, if both an ovf xml and metadata/userdata from customization(imc) are available, metadata/userdata from customization will be loaded firstly, so moving VMware DS to be in front of OVF actually reverts to previous behavior on VMware platform.

For the other data transports (EnvVars, GuestInfo) in VMware DS, they must be configured intentionally on VMware VM, not like pre-existing OVF properties in Ubuntu cloudimg ova. I think we can document this behavior change to let customer know, but let me sync with Andrew internally on this.

btw: I think you might know, VMware also sends request to other Linux vendors (ex: Redhat, SUSE) to backport cloud-init releases.

Revision history for this message
Pengpeng Sun (pengpengs) wrote :

Tracking this in VMware PR number 3164644

Revision history for this message
James Falcon (falcojr) wrote :
Changed in cloud-init:
status: Triaged → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.