Unnecessary data copy during cold snapshot

Bug #1262914 reported by Dmitry Borodaenko
38
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Medium
Unassigned

Bug Description

When creating a cold snapshot, LibvirtDriver.snapshot() creates a local copy of the VM image before uploading from that copy into a new image in Glance.

In case of snapshotting a local file backed VM to Swift, that's one copy too many: if the target format matches the source format, the local file can be uploaded directly, halving the time it takes to create a snapshot. In case of snapshotting an RBD backed VM to RBD backed Glance, that's two copies too many: a copy-on-write clone of the VM drive could obviate the need to copy any data at all.

I think that instead of passing the target location as a temporary file path under snapshots_directory, LibvirtDriver.snapshot() should pass image metadata to Image.snapshot_extract() and let the image backend figure out and return the target location.

Tags: libvirt
Revision history for this message
John Garbutt (johngarbutt) wrote :

Personally, the VM could be turned on at any time, so this seems like the safest thing to do.

I will let the libvirt experts take a look at this.

tags: added: libvirt
Changed in nova:
status: New → Opinion
importance: Undecided → Wishlist
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

If VM is turned on while snapshot is being taken, local copy can be affected just the same as upload to Swift, all you'd gain is possibly a smaller time window, and that only if your Swift is significantly slower than local storage on your compute nodes.

In case of RBD, cloning the image is an atomic operation so it eliminates the time window when creating a snapshot can race with starting the VM altogether.

Changed in nova:
status: Opinion → Confirmed
Revision history for this message
Michael H Wilson (geekinutah) wrote :

I think this needs a priority beyond wishlist. An operation being twice as slow as it should is buggy, in the RBD case we aren't even doing it right. I've added a separate bug for just the RBD case: https://bugs.launchpad.net/nova/+bug/1346525

Tracy Jones (tjones-i)
Changed in nova:
importance: Wishlist → Medium
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Dmitry,

Is this what you're referring to? : when creating an offline Nova
snapshot, the Nova instance's (that's being snapshotted) disk is copied
into a temporary location before uploading it to the the Glance.

Tested with a week's old Nova git and a qcow2 CirrOS image:

    $ git describe
    2015.1.0rc1-300-g39bbc0d

Test
----

When you run an `image-create` on an offline Nova instance:

    $ nova boot --flavor 1 --key_name oskey1 \
        --image cirros-0.3.3-x86_64-disk cirrvm
    $ nova shutdown cirrvm
    $ nova image-create cirrvm snap --poll

A copy of the Nova instance being snapshotted is placed in a temporary
directory (before it is uploaded to Glance):

    "qemu-img convert -f qcow2 -O qcow2 \
      /home/kashyapc/src/cloud/data/nova/instances/aa20be6e-de39-4a15-9f95-9844ec9af5a9/disk \
      /home/kashyapc/src/cloud/data/nova/instances/snapshots/tmp2h6al2/1e00639002e2420ba3747145f06511d8"

NOTE: In this case, the above 'convert' command essentially just means
the file called 'disk' is just copied to the "snapshots/tmp2h6al2"
directory, because both the source _and_ destination formats are qcow2
-- so no format conversion is going on.

Where the 'snapshot()' function from nova/virt/libvirt/driver.py is
calling 'snapshot_extract()' from libvirt/imagebackend.py:

    . . .
    1363 snapshot_backend = self.image_backend.snapshot(instance,
    1364 disk_path,
    1365 image_type=source_format)
    . . .
    1380 if live_snapshot:

    . . . . . .

    1385 else:
    1386 snapshot_backend.snapshot_extract(out_path, image_format)
    . . .

Where the 'snapshot_extract()' calls the 'extract_snapshot()' from
libvirt/utils.py:

    . . .
    510 def snapshot_extract(self, target, out_format):
    511 libvirt_utils.extract_snapshot(self.path, 'qcow2',
    512 target,
    513 out_format)
    . . .

Where the 'extract_snapshot()' from libvirt/utils.py, finally executes
the `qemu-img convert` command:

    . . .
    387 qemu_img_cmd = ('qemu-img', 'convert', '-f', source_fmt, '-O', dest_fmt)
    . . .

After this, the converted (i.e. copied) image is uploaded to Glance.

Changed in nova:
status: Confirmed → Incomplete
Revision history for this message
Dan Smith (danms) wrote :

Agree with this being Opinion/Wishlist. Making the copy is flattening the image as fast as possible so that the instance can be restarted. We could further complicate this code by trying to optimize something, but that's definitely wishlist-level stuff, IMHO.

Revision history for this message
Zoltan Arnold Nagy (zoltan) wrote :

Please note that the case Dmitry is talking about when you are using the same backend storage for both images and ephemeral (for example, ceph's RBD).

In such a case it is a HUGE waste of resources (bandwidth, temporary space), and IMHO a genuine bug.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.