Comment 23 for bug 1334398

Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Here's some investigation of what happens when _live_snapshot is
invoked, at Libvirt level. I performed the live_snapshot test w/ current
git (after I modified MIN_LIBVIRT_LIVESNAPSHOT_VERSION=1.0.0 for testing
purpose) with below log filters in /etc/libvirt/libvirtd.conf (and
restared libvirt):

    log_level = 1
    log_outputs="1:file:/var/tmp/libvirtd.log"

    Find what QMP commands libvirt is sending to QEMU
    log_filters="1:qemu_monitor"

Libvirt call sequence (More
---------------------

(1) virDomainGetXMLDesc
(2) virDomainBlockJobAbort
(3) virDomainUndefine
(4) virDomainBlockRebase
     - NOTE (from libvirt documentation): By default, the copy job runs
       in the background, and consists of two phases: (a) the block
       operation copies all data from the source, and during this phase,
       the job can only be canceled to revert back to the source disk,
       with no guarantees about the destination. (b) After phase (a0
       completes, both the source and the destination remain mirrored
       until a call to the block opertation with the --abort.

(5) virDomainBlockJobAbort.

Test
----

Boot a new Nova instance:

    $ nova boot --flavor 1 --key_name oskey1 --image \
        cirros-0.3.0-x86_64-disk cvm1

Issue a snapshot (this should trigger the _live_snapshot code path):

    $ nova image-create --poll cvm1 snap1-cvm1

Ensure that "live snapshot" _did_ take place by searching the
'screen-n-cpu.log':

    $ grep -i "Beginning live snapshot process" ../data/new/screen-logs/screen-n-cpu.log
    2014-06-30 03:34:32.237 INFO nova.virt.libvirt.driver [req-b02ac5c5-7694-44ce-9185-129b91eaf6b5 admin admin] [instance: 4b62dab0-0efe-4125-84ec-3e24d3371082] Beginning live snapshot process
    $

Libvirt logs
------------

(1) Save copy of libvirt XML(virDomainGetXMLDesc):
----
2014-06-30 09:08:13.586+0000: 8470: debug : virDomainGetXMLDesc:4344 : dom=0x7f2994000fd0, (VM: name=instance-00000001, uuid=4b62dab0-0efe-4125-84ec-3e24d3371082), flags=0
----

(2) Issue a BlockJobAbort (virDomainBlockJobAbort), so that any prior active block operation on that disk will be cancelled:
----
2014-06-30 09:08:13.632+0000: 8470: debug : virDomainBlockJobAbort:19492 : dom=0x7f2994000af0, (VM: name=instance-00000001, uuid=4b62dab0-0efe-4125-84ec-3e24d3371082), disk=/home/kashyapc/src/openstack/data/nova/instances/4b62dab0-0efe-4125-84ec-3e24d3371082/disk, flags=0
----

(3) Undefining the running libvirt domain (virDomainUndefine), to make it transient[*]:
----
2014-06-30 09:08:14.069+0000: 8471: debug : virDomainUndefine:8683 : dom=0x7f29a0000910, (VM: name=instance-00000001, uuid=4b62dab0-0efe-4125-84ec-3e24d3371082)
2014-06-30 09:08:14.069+0000: 8471: info : qemuDomainUndefineFlags:6334 : Undefining domain 'instance-00000001'
----

We'll define the guest again, further below from the saved copy from step(1).

[*] Reasoning for making the domain transient: BlockRebase ('blockcopy')
jobs last forever until canceled, which implies that they should last
across domain restarts if the domain were persistent. But, QEMU doesn't
yet provide a way to restart a copy job on domain restart (while
mirroring is still intact). So the trick is to temporarily make the
domain transient.

(4) Invoke libvirt BlockRebase (and mirroring):
----
2014-06-30 09:08:14.070+0000: 8470: debug : virDomainBlockRebase:19785 : dom=0x7f2994000e40, (VM: name=instance-00000001, uuid=4b62dab0-0efe-4125-84ec-3e24d3371082), disk=/home/kashyapc/src/openstack/data/nova/instances/4b62dab0-0efe-4125-84ec-3e24d3371082/disk, base=/home/kashyapc/src/openstack/data/nova/instances/snapshots/tmp9_j3S0/5644b17a7de6427993d12cbab6ca3205.delta, bandwidth=0, flags=b
----

(4.1) QEMU 'drive-mirror' in progress:
[. . .]
2014-06-30 09:08:14.071+0000: 8470: debug : qemuMonitorJSONCommandWithFd:286 : Send command '{"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"/home/kashyapc/src/openstack/data/nova/
instances/snapshots/tmp9_j3S0/5644b17a7de6427993d12cbab6ca3205.delta","speed":0,"sync":"top","mode":"existing","format":"qcow2"},"id":"libvirt-10"}' for write with FD -1
2014-06-30 09:08:14.071+0000: 8470: debug : virEventPollUpdateHandle:152 : EVENT_POLL_UPDATE_HANDLE: watch=10 events=15
2014-06-30 09:08:14.071+0000: 8470: debug : virEventPollInterruptLocked:729 : Interrupting
2014-06-30 09:08:14.071+0000: 8470: debug : qemuMonitorSend:969 : QEMU_MONITOR_SEND_MSG: mon=0x7f299c000bc0 msg={"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"/home/kashyapc/src/o
penstack/data/nova/instances/snapshots/tmp9_j3S0/5644b17a7de6427993d12cbab6ca3205.delta","speed":0,"sync":"top","mode":"existing","format":"qcow2"},"id":"libvirt-10"}
 fd=-1
[. . .]

(5) Issue Abort, so that block mirroring can be completed gracefully:
----
2014-06-30 09:08:14.615+0000: 8472: debug : virDomainBlockJobAbort:19492 : dom=0x7f299c001670, (VM: name=instance-00000001, uuid=4b62dab0-0efe-4125-84ec-3e24d3371082), disk=/home/kashyapc/src/openstack/data/nova/instances/4b62dab0-0efe-4125-84ec-3e24d3371082/disk, flags=0
[. . .]
2014-06-30 09:08:14.618+0000: 8466: debug : qemuMonitorIOWrite:507 : QEMU_MONITOR_IO_WRITE: mon=0x7f299c000bc0 buf={"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-13"}
 len=94 ret=94 errno=11
----

(6)
----
2014-06-30 09:08:14.719+0000: 8470: debug : virDomainDefineXML:8639 : conn=0x7f2994000c50, xml=<domain type='kvm' id='2'>
  <name>instance-00000001</name>
[. . .]
2014-06-30 09:08:14.764+0000: 8470: info : qemuDomainDefineXML:6248 : Creating domain 'instance-00000001'
----