vsphere: cache VMDKs in datastore to avoid repeated downloads and firewalled hosts

Bug #1711019 reported by Andrew Wilkins
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Andrew Wilkins

Bug Description

The vSphere provider currently downloads images from cloud-images.ubuntu.com for every bootstrap and deploy. This is wasteful, and slows deployments. We should cache the images in the vSphere datastore.

Tags: conjure
Revision history for this message
Adam Stokes (adam-stokes) wrote :

Another issue is that the host is firewalled off so our images should be uploaded to the datastore regardless.

From IRC:

```
sure, but it tries to contact the vsphere *host*

which is firewalled off, as it should be
`juju.cmd.juju.commands bootstrap.go:492 failed to bootstrap model: cannot start bootstrap instance: failed to create instance in any availability zone: uploading ubuntu-xenial-16.04-cloudimg.vmdk to https://10.32.252.51/nfc/52774700-37f1-4a46-cc1f-de20c50f94e5/disk-0.vmdk: Post https://10.32.252.51/nfc/52774700-37f1-4a46-cc1f-de20c50f94e5/disk-0.vmdk: Service Unavailable`

that IP is the host, the API is accessible

our vsphere guy says it should upload it to the datastore, then create a VM from that vmdk in the datastore
```
it shouldn't be uploading anything to 10.32.252.51 as far as I can tell

summary: - vsphere: cache VMDKs in datastore to avoid repeated downloads
+ vsphere: cache VMDKs in datastore to avoid repeated downloads and
+ firewalled hosts
tags: added: conjure
Revision history for this message
Tom (orf) wrote :

Yes, this is seemingly a big problem with the vsphere additions. Clients running the bootstrap should not be allowed to access the vsphere host IP directly, everything needs to go through the API.

You need to 'register' the .vmdk file in the datastore with vSphere.

Here is some information:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006160

https://serverfault.com/questions/579866/deploying-vm-from-vmdk-vmx-file

Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.2.3
Revision history for this message
Andrew Wilkins (axwalk) wrote :

Thanks for the input and links, Tom. We'll take a look at this soon -- hopefully before 2.2.3.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

AFAICT, using an OVF requires that you upload the disk to the host directly. https://www.vmware.com/support/developer/converter-sdk/conv55_apireference/vim.ResourcePool.html#importVApp. The URL that is returned for posting the image to is for the host.

I think we may have to redo the VM creation in the provider, so it doesn't use OVF at all. We end up overriding a bunch of things anyway, so there's probably not a lot of value in using it in the first place.

The OpenStack Nova VMWare vCenter driver should serve as a good reference: https://github.com/openstack/nova/blob/master/nova/virt/vmwareapi/driver.py. The nova driver caches VMDKs in the datastore, and uses the CreateVM_Task API to create servers.

Changed in juju:
milestone: 2.2.3 → 2.3.0
Revision history for this message
Andrew Wilkins (axwalk) wrote :

So we need to use OVF because of cloud-init, but it turns out it's not too hard to customise the VM config returned from creating the import spec, so it uses a VMDK existing in the datastore rather than requiring that one be uploaded.

Changed in juju:
status: Triaged → In Progress
assignee: nobody → Andrew Wilkins (axwalk)
Revision history for this message
Andrew Wilkins (axwalk) wrote :

I have this working:
 - upload VMDK to datastore (if it doesn't already exist)
 - convert VMDK to disk format in datastore
 - create VM with a delta disk, with parent backing using the above
 - remove VMs with process: power off, delete VM-specific bits (not shared VMDK), unregister VM

That all works, but now we can't specify the root disk size; that's apparently not supported with delta disks. So I may have to look at copying the VMDK for each VM instead. That's what we were doing already, so it's not worse.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

It doesn't look like there's a safe way for controllers to coordinate, via the vSphere APIs alone, to manage a cache of images for use of multiple controllers. Therefore I plan to have the controllers manage per-controller caches, which will be created and removed with the controller. So we'll still have to upload once at bootstrap time (and subsequently for every other OS series), but then we'll reuse the VMDK in the datastore for additional deploys of the same OS series.

We should separately look to provide a vsphere-specific simplestreams source, like we do for OpenStack. This would enable users to mirror (a subset of) cloud-images.ubuntu.com to their datastore, and have Juju use that in preference to going out to cloud-images.ubuntu.com. We can potentially make some optimisations when that's used, like copying VMDKs within the datastore, without requiring the client to download/upload.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

The approach I intended to take is bafflingly slow:
 1. fetch OVA, walk tar in memory to find VMDK, upload to datastore
 2. create a bare-bones VM with the VMDK as disk, and clone to import the VMDK in disk format
 3. detach and move the disk to the cache dir for future use
 4. if the user specifies a root-disk constraint > default capacity (10GiB):
    4a. copy the disk to a temporary location
    4b. expand the copy
    4c. move to the final destination

Up to and including step 3, everything is fine. Copying the disk is *super slow*, as is moving it. It's all within the same datastore. The intention behind this approach was to use the imported VMDK as a linked disk, to keep datastore resource usage down.

I'll try another approach, cloning for each new VM and expanding the disk in place as necessary. This will prevent using linked disks, but it seems like that might be unavoidable.

Revision history for this message
Andrew Wilkins (axwalk) wrote :
Revision history for this message
Andrew Wilkins (axwalk) wrote :

This has landed. As mentioned in comment #7, we're only caching within a controller's lifetime, and additional work will be required to cache between bootstraps. Marking this one committed. I've opened https://bugs.launchpad.net/juju/+bug/1717399 for remaining work.

Changed in juju:
status: In Progress → Fix Committed
milestone: 2.3.0 → 2.3-alpha1
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.