Ubuntu vmdk fails to boot with hardware virtualization version 11

Bug #1754030 reported by John A Meinel
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
cloud-images
Triaged
Undecided
Unassigned

Bug Description

We're running VMWare 6.0, but have always been launching instances in 5.5 compatibility mode (hardware version 10).
When we try Upgrading the Compatibility to 6.0 (version 11), the machines start failing to boot. This happened with both the Xenial image and the Artful image.

Revision history for this message
John A Meinel (jameinel) wrote :

I tried going via the vsphere web UI (flash) and directly booting xenial-server-cloudimg-amd64-disk1.vmdk but its possible I had set up the configuration incorrectly.
I've since tried to just directly import the ova from the 'govc' commandline:
govc import.ova ./xenial-server-cloudimg-amd64.ova

One thing I noted was that if you "tar xf xenial-server-cloudimg-amd64.ova" you get
... 284868608 ubuntu-xenial-16.04-cloudimg.vmdk
... 277238272 xenial-server-cloudimg-amd64-disk1.vmdk

Not only is it a very different filename, but its also a very different file size.

Note that the OVF that is embedded in the OVA is hard-coded to hardware version 10:
<vssd:VirtualSystemType>vmx-10</vssd:VirtualSystemType>

When I first power on a newly imported OVA and watch the console, it tells me immediately:
  no such device root

  press any key to continue

Though eventually it moves forward on its own, without me interfering.

It does eventually boot to an "ubuntuguest login:" prompt.

Note that I don't currently know what/who to login as. (reading through the ovf file, seems to be setting a "Default User Password" to the empty string, but 'ubuntu' and 'ubuntuguest' don't seem to work.)

However, it does succeed in booting and getting to a login prompt.

If I then ask vsphere to upgrade the compatibility to ESXi 6.0 aka 11 (from 5.5 aka 10), then the boot process appears to stop at:
  [ 6. ] usbhid: USB HID core driver
  [ 26. ] random: nonblocking pool is initialized

however it does appear to hang in a similar way to the way that we were experiencing with Juju.

Revision history for this message
John A Meinel (jameinel) wrote :

Just to confirm, I tried it with both Xenial and Bionic images (ubuntu-bionic-18.04-cloudimg-20180307)

tags: added: id-5ab130f3557aead7be00375f
Revision history for this message
Robert C Jennings (rcj) wrote :

I've pulled the latest xenial OVA, imported via vSphere 6 client, upgraded to v11 hardware; this recreated the failure. I then added a serial console for this VM by editing the VM settings (console to file) and booted to get more debug info; now the VM boots.

Workaround:
 - convert from v10 to v11
 - manually add a serial console

I am exporting the VMs so that I can diff the XML in each of the OVFs.

Strangely, when I removed the serial console the VM stopped booting again. My theory had been that adding the serial console rewrote the OVF XML and added bits unrelated to the serial console that happened to make for a bootable VM. I had been thinking that v10->v11 conversion had a bug but editing the VM settings and rewriting the XML when adding the serial console would resolve the conversion error; it might be more complex than that however.

Changed in cloud-images:
status: New → Triaged
Revision history for this message
Robert C Jennings (rcj) wrote :

John,

In terms of priority for this bug, is there something you need that is only available with v11?

Revision history for this message
Tim Penhey (thumper) wrote :

While not a critical priority, we really want vSphere to be a well supported cloud, and that means supporting the newer hardware versions. There was a customer request to support v13, but since our vSphere was only version 6, we only had hardware v11.

This is more about having our vSphere support considered good than any particular thing we need urgently (as far as I am aware).

Revision history for this message
Robert C Jennings (rcj) wrote :

In /etc/default/grub.d/50-cloudimg-settings.cfg we set the console on the kernel command-line to use both tty1 and ttyS0. For v11 VMs with no changes to the hardware configuration (no addition of a serial console) we need to remove ttyS0 from the kernel command-line to prevent boot from stalling.

Additionally, we could consider setting "datasource_list: [ OVF, None ]" for cloud-init config if appropriate for all consumers of OVA. Under VMWare we see a warning from cloud-init if the default, full datasource list is specified.

Revision history for this message
Robert C Jennings (rcj) wrote :

I said that I'd attach OVFs for the v11 VMs with and without the serial console. Having looked at them there are no differences between them except the serial stanza, previously I thought we might have other deltas that might be interesting. I've abandoned that path of investigation. The kernel cmdline setting appears to be the culprit.

Revision history for this message
Chris Privitere (cprivite) wrote :

If you're looking for reasons why customers require a later VM hardware version, you need version 11 for the faster spectre/meltdown mitigations (11 enables PCID/INVPCID) and you need version 13 or 15 (the documentation varies on which it is) to support the VMWare Cloud Provider and do things like using vmdks as dynamic storage in your kubernetes cluster.

From a wider stance, with 20.04 taking away the easier to customize installer, it'll help customers a lot to have more easy to deploy cloud images available.

Revision history for this message
Chris Privitere (cprivite) wrote :

And yes, this is still an issue on the 1804 and 2004 .ova images.

no longer affects: ubuntu
tags: added: sts
tags: removed: sts
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.