OpenStack Compute (nova)

Unnecessary data copy during cold snapshot

Bug #1262914 reported by Dmitry Borodaenko on 2013-12-19

This bug affects 7 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Expired	Medium	Unassigned

Bug Description

When creating a cold snapshot, LibvirtDriver.snapshot() creates a local copy of the VM image before uploading from that copy into a new image in Glance.

In case of snapshotting a local file backed VM to Swift, that's one copy too many: if the target format matches the source format, the local file can be uploaded directly, halving the time it takes to create a snapshot. In case of snapshotting an RBD backed VM to RBD backed Glance, that's two copies too many: a copy-on-write clone of the VM drive could obviate the need to copy any data at all.

I think that instead of passing the target location as a temporary file path under snapshots_directory, LibvirtDriver.snapshot() should pass image metadata to Image.snapshot_extract() and let the image backend figure out and return the target location.

Tags:

Revision history for this message

John Garbutt (johngarbutt) wrote on 2014-02-07:

Personally, the VM could be turned on at any time, so this seems like the safest thing to do.

I will let the libvirt experts take a look at this.

tags:	added: libvirt
Changed in nova:
status:	New → Opinion
importance:	Undecided → Wishlist

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-04-28:

If VM is turned on while snapshot is being taken, local copy can be affected just the same as upload to Swift, all you'd gain is possibly a smaller time window, and that only if your Swift is significantly slower than local storage on your compute nodes.

In case of RBD, cloning the image is an atomic operation so it eliminates the time window when creating a snapshot can race with starting the VM altogether.

Michael H Wilson (geekinutah) on 2014-07-21

Changed in nova:
status:	Opinion → Confirmed

Revision history for this message

Michael H Wilson (geekinutah) wrote on 2014-07-21:

I think this needs a priority beyond wishlist. An operation being twice as slow as it should is buggy, in the RBD case we aren't even doing it right. I've added a separate bug for just the RBD case: https://bugs.launchpad.net/nova/+bug/1346525

Tracy Jones (tjones-i) on 2014-07-21

Changed in nova:
importance:	Wishlist → Medium

Revision history for this message

Kashyap Chamarthy (kashyapc) wrote on 2015-05-08:

Dmitry,

Is this what you're referring to? : when creating an offline Nova
snapshot, the Nova instance's (that's being snapshotted) disk is copied
into a temporary location before uploading it to the the Glance.

Tested with a week's old Nova git and a qcow2 CirrOS image:

$ git describe
2015.1.0rc1-300-g39bbc0d

Test
----

When you run an `image-create` on an offline Nova instance:

    $ nova boot --flavor 1 --key_name oskey1 \
        --image cirros-0.3.3-x86_64-disk cirrvm
    $ nova shutdown cirrvm
    $ nova image-create cirrvm snap --poll

A copy of the Nova instance being snapshotted is placed in a temporary
directory (before it is uploaded to Glance):

    "qemu-img convert -f qcow2 -O qcow2 \
      /home/kashyapc/src/cloud/data/nova/instances/aa20be6e-de39-4a15-9f95-9844ec9af5a9/disk \
      /home/kashyapc/src/cloud/data/nova/instances/snapshots/tmp2h6al2/1e00639002e2420ba3747145f06511d8"

NOTE: In this case, the above 'convert' command essentially just means
the file called 'disk' is just copied to the "snapshots/tmp2h6al2"
directory, because both the source _and_ destination formats are qcow2
-- so no format conversion is going on.

Where the 'snapshot()' function from nova/virt/libvirt/driver.py is
calling 'snapshot_extract()' from libvirt/imagebackend.py:

    . . .
    1363 snapshot_backend = self.image_backend.snapshot(instance,
    1364 disk_path,
    1365 image_type=source_format)
    . . .
    1380 if live_snapshot:

. . . . . .

    1385 else:
    1386 snapshot_backend.snapshot_extract(out_path, image_format)
    . . .

Where the 'snapshot_extract()' calls the 'extract_snapshot()' from
libvirt/utils.py:

    . . .
    510 def snapshot_extract(self, target, out_format):
    511 libvirt_utils.extract_snapshot(self.path, 'qcow2',
    512 target,
    513 out_format)
    . . .

Where the 'extract_snapshot()' from libvirt/utils.py, finally executes
the `qemu-img convert` command:

    . . .
    387 qemu_img_cmd = ('qemu-img', 'convert', '-f', source_fmt, '-O', dest_fmt)
    . . .

After this, the converted (i.e. copied) image is uploaded to Glance.

Dmitry,

Tested with a week's old Nova git and a qcow2 CirrOS image:

$ git describe
    2015.1.0rc1-300-g39bbc0d

Test
----

When you run an `image-create` on an offline Nova instance:

$ nova boot --flavor 1 --key_name oskey1 \
        --image cirros-0.3.3-x86_64-disk cirrvm
    $ nova shutdown cirrvm
    $ nova image-create cirrvm snap --poll

A copy of the Nova instance being snapshotted is placed in a temporary
directory (before it is uploaded to Glance):

"qemu-img convert -f qcow2 -O qcow2  \
      /home/kashyapc/src/cloud/data/nova/instances/aa20be6e-de39-4a15-9f95-9844ec9af5a9/disk \
      /home/kashyapc/src/cloud/data/nova/instances/snapshots/tmp2h6al2/1e00639002e2420ba3747145f06511d8"

NOTE: In this case, the above 'convert' command essentially just means
the file called 'disk' is just copied to the "snapshots/tmp2h6al2"
directory,  because both the source _and_ destination formats are qcow2
-- so no format conversion is going on.

Where the 'snapshot()' function from nova/virt/libvirt/driver.py is
calling 'snapshot_extract()' from libvirt/imagebackend.py:

. . .
    1363         snapshot_backend = self.image_backend.snapshot(instance,
    1364                 disk_path,
    1365                 image_type=source_format)
    . . .
    1380                 if live_snapshot:
    
    . . .                . . .

1385                 else:
    1386                     snapshot_backend.snapshot_extract(out_path, image_format)
    . . .

Where the 'snapshot_extract()' calls the 'extract_snapshot()' from
libvirt/utils.py:

. . .
    510     def snapshot_extract(self, target, out_format):
    511         libvirt_utils.extract_snapshot(self.path, 'qcow2',
    512                                        target,
    513                                        out_format)
    . . .

Where the 'extract_snapshot()' from libvirt/utils.py, finally executes
the `qemu-img convert` command:

. . .
    387     qemu_img_cmd = ('qemu-img', 'convert', '-f', source_fmt, '-O', dest_fmt)
    . . .

After this, the converted (i.e. copied) image is uploaded to Glance.

Kashyap Chamarthy (kashyapc) on 2015-05-08

Changed in nova:
status:	Confirmed → Incomplete

Revision history for this message

Dan Smith (danms) wrote on 2015-05-08:

Agree with this being Opinion/Wishlist. Making the copy is flattening the image as fast as possible so that the instance can be restarted. We could further complicate this code by trying to optimize something, but that's definitely wishlist-level stuff, IMHO.

Revision history for this message

Zoltan Arnold Nagy (zoltan) wrote on 2015-05-08:

Please note that the case Dmitry is talking about when you are using the same backend storage for both images and ephemeral (for example, ceph's RBD).

In such a case it is a HUGE waste of resources (bandwidth, temporary space), and IMHO a genuine bug.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-07-08:

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status:	Incomplete → Expired

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.