Here's some investigation of what happens when _live_snapshot is
invoked, at Libvirt level. I performed the live_snapshot test w/ current
git (after I modified MIN_LIBVIRT_LIVESNAPSHOT_VERSION=1.0.0 for testing
purpose) with below log filters in /etc/libvirt/libvirtd.conf (and
restared libvirt):
Find what QMP commands libvirt is sending to QEMU
log_filters="1:qemu_monitor"
Libvirt call sequence (More
---------------------
(1) virDomainGetXMLDesc
(2) virDomainBlockJobAbort
(3) virDomainUndefine
(4) virDomainBlockRebase
- NOTE (from libvirt documentation): By default, the copy job runs
in the background, and consists of two phases: (a) the block
operation copies all data from the source, and during this phase,
the job can only be canceled to revert back to the source disk,
with no guarantees about the destination. (b) After phase (a0
completes, both the source and the destination remain mirrored
until a call to the block opertation with the --abort.
Issue a snapshot (this should trigger the _live_snapshot code path):
$ nova image-create --poll cvm1 snap1-cvm1
Ensure that "live snapshot" _did_ take place by searching the
'screen-n-cpu.log':
$ grep -i "Beginning live snapshot process" ../data/new/screen-logs/screen-n-cpu.log
2014-06-30 03:34:32.237 INFO nova.virt.libvirt.driver [req-b02ac5c5-7694-44ce-9185-129b91eaf6b5 admin admin] [instance: 4b62dab0-0efe-4125-84ec-3e24d3371082] Beginning live snapshot process
$
Libvirt logs
------------
(1) Save copy of libvirt XML(virDomainGetXMLDesc):
----
2014-06-30 09:08:13.586+0000: 8470: debug : virDomainGetXMLDesc:4344 : dom=0x7f2994000fd0, (VM: name=instance-00000001, uuid=4b62dab0-0efe-4125-84ec-3e24d3371082), flags=0
----
(2) Issue a BlockJobAbort (virDomainBlockJobAbort), so that any prior active block operation on that disk will be cancelled:
----
2014-06-30 09:08:13.632+0000: 8470: debug : virDomainBlockJobAbort:19492 : dom=0x7f2994000af0, (VM: name=instance-00000001, uuid=4b62dab0-0efe-4125-84ec-3e24d3371082), disk=/home/kashyapc/src/openstack/data/nova/instances/4b62dab0-0efe-4125-84ec-3e24d3371082/disk, flags=0
----
(3) Undefining the running libvirt domain (virDomainUndefine), to make it transient[*]:
----
2014-06-30 09:08:14.069+0000: 8471: debug : virDomainUndefine:8683 : dom=0x7f29a0000910, (VM: name=instance-00000001, uuid=4b62dab0-0efe-4125-84ec-3e24d3371082)
2014-06-30 09:08:14.069+0000: 8471: info : qemuDomainUndefineFlags:6334 : Undefining domain 'instance-00000001'
----
We'll define the guest again, further below from the saved copy from step(1).
[*] Reasoning for making the domain transient: BlockRebase ('blockcopy')
jobs last forever until canceled, which implies that they should last
across domain restarts if the domain were persistent. But, QEMU doesn't
yet provide a way to restart a copy job on domain restart (while
mirroring is still intact). So the trick is to temporarily make the
domain transient.
Here's some investigation of what happens when _live_snapshot is LIVESNAPSHOT_ VERSION= 1.0.0 for testing libvirtd. conf (and
invoked, at Libvirt level. I performed the live_snapshot test w/ current
git (after I modified MIN_LIBVIRT_
purpose) with below log filters in /etc/libvirt/
restared libvirt):
log_level = 1 outputs= "1:file: /var/tmp/ libvirtd. log"
log_
Find what QMP commands libvirt is sending to QEMU filters= "1:qemu_ monitor"
log_
Libvirt call sequence (More ------- -------
-------
(1) virDomainGetXMLDesc obAbort ebase
(2) virDomainBlockJ
(3) virDomainUndefine
(4) virDomainBlockR
- NOTE (from libvirt documentation): By default, the copy job runs
in the background, and consists of two phases: (a) the block
operation copies all data from the source, and during this phase,
the job can only be canceled to revert back to the source disk,
with no guarantees about the destination. (b) After phase (a0
completes, both the source and the destination remain mirrored
until a call to the block opertation with the --abort.
(5) virDomainBlockJ obAbort.
Test
----
Boot a new Nova instance:
$ nova boot --flavor 1 --key_name oskey1 --image \
cirros- 0.3.0-x86_ 64-disk cvm1
Issue a snapshot (this should trigger the _live_snapshot code path):
$ nova image-create --poll cvm1 snap1-cvm1
Ensure that "live snapshot" _did_ take place by searching the
'screen-n-cpu.log':
$ grep -i "Beginning live snapshot process" ../data/ new/screen- logs/screen- n-cpu.log libvirt. driver [req-b02ac5c5- 7694-44ce- 9185-129b91eaf6 b5 admin admin] [instance: 4b62dab0- 0efe-4125- 84ec-3e24d33710 82] Beginning live snapshot process
2014-06-30 03:34:32.237 INFO nova.virt.
$
Libvirt logs
------------
(1) Save copy of libvirt XML(virDomainGe tXMLDesc) : Desc:4344 : dom=0x7f2994000fd0, (VM: name=instance- 00000001, uuid=4b62dab0- 0efe-4125- 84ec-3e24d33710 82), flags=0
----
2014-06-30 09:08:13.586+0000: 8470: debug : virDomainGetXML
----
(2) Issue a BlockJobAbort (virDomainBlock JobAbort) , so that any prior active block operation on that disk will be cancelled: obAbort: 19492 : dom=0x7f2994000af0, (VM: name=instance- 00000001, uuid=4b62dab0- 0efe-4125- 84ec-3e24d33710 82), disk=/home/ kashyapc/ src/openstack/ data/nova/ instances/ 4b62dab0- 0efe-4125- 84ec-3e24d33710 82/disk, flags=0
----
2014-06-30 09:08:13.632+0000: 8470: debug : virDomainBlockJ
----
(3) Undefining the running libvirt domain (virDomainUndef ine), to make it transient[*]: ne:8683 : dom=0x7f29a0000910, (VM: name=instance- 00000001, uuid=4b62dab0- 0efe-4125- 84ec-3e24d33710 82) ineFlags: 6334 : Undefining domain 'instance-00000001'
----
2014-06-30 09:08:14.069+0000: 8471: debug : virDomainUndefi
2014-06-30 09:08:14.069+0000: 8471: info : qemuDomainUndef
----
We'll define the guest again, further below from the saved copy from step(1).
[*] Reasoning for making the domain transient: BlockRebase ('blockcopy')
jobs last forever until canceled, which implies that they should last
across domain restarts if the domain were persistent. But, QEMU doesn't
yet provide a way to restart a copy job on domain restart (while
mirroring is still intact). So the trick is to temporarily make the
domain transient.
(4) Invoke libvirt BlockRebase (and mirroring): ebase:19785 : dom=0x7f2994000e40, (VM: name=instance- 00000001, uuid=4b62dab0- 0efe-4125- 84ec-3e24d33710 82), disk=/home/ kashyapc/ src/openstack/ data/nova/ instances/ 4b62dab0- 0efe-4125- 84ec-3e24d33710 82/disk, base=/home/ kashyapc/ src/openstack/ data/nova/ instances/ snapshots/ tmp9_j3S0/ 5644b17a7de6427 993d12cbab6ca32 05.delta, bandwidth=0, flags=b
----
2014-06-30 09:08:14.070+0000: 8470: debug : virDomainBlockR
----
(4.1) QEMU 'drive-mirror' in progress: CommandWithFd: 286 : Send command '{"execute" :"drive- mirror" ,"arguments" :{"device" :"drive- virtio- disk0", "target" :"/home/ kashyapc/ src/openstack/ data/nova/ snapshots/ tmp9_j3S0/ 5644b17a7de6427 993d12cbab6ca32 05.delta" ,"speed" :0,"sync" :"top", "mode": "existing" ,"format" :"qcow2" },"id": "libvirt- 10"}' for write with FD -1 ateHandle: 152 : EVENT_POLL_ UPDATE_ HANDLE: watch=10 events=15 erruptLocked: 729 : Interrupting SEND_MSG: mon=0x7f299c000bc0 msg={"execute" :"drive- mirror" ,"arguments" :{"device" :"drive- virtio- disk0", "target" :"/home/ kashyapc/ src/o data/nova/ instances/ snapshots/ tmp9_j3S0/ 5644b17a7de6427 993d12cbab6ca32 05.delta" ,"speed" :0,"sync" :"top", "mode": "existing" ,"format" :"qcow2" },"id": "libvirt- 10"}
[. . .]
2014-06-30 09:08:14.071+0000: 8470: debug : qemuMonitorJSON
instances/
2014-06-30 09:08:14.071+0000: 8470: debug : virEventPollUpd
2014-06-30 09:08:14.071+0000: 8470: debug : virEventPollInt
2014-06-30 09:08:14.071+0000: 8470: debug : qemuMonitorSend:969 : QEMU_MONITOR_
penstack/
fd=-1
[. . .]
(5) Issue Abort, so that block mirroring can be completed gracefully: obAbort: 19492 : dom=0x7f299c001670, (VM: name=instance- 00000001, uuid=4b62dab0- 0efe-4125- 84ec-3e24d33710 82), disk=/home/ kashyapc/ src/openstack/ data/nova/ instances/ 4b62dab0- 0efe-4125- 84ec-3e24d33710 82/disk, flags=0 ite:507 : QEMU_MONITOR_ IO_WRITE: mon=0x7f299c000bc0 buf={"execute" :"block- job-cancel" ,"arguments" :{"device" :"drive- virtio- disk0"} ,"id":" libvirt- 13"}
----
2014-06-30 09:08:14.615+0000: 8472: debug : virDomainBlockJ
[. . .]
2014-06-30 09:08:14.618+0000: 8466: debug : qemuMonitorIOWr
len=94 ret=94 errno=11
----
(6) XML:8639 : conn=0x7f299400 0c50, xml=<domain type='kvm' id='2'> instance- 00000001< /name> eXML:6248 : Creating domain 'instance-00000001'
----
2014-06-30 09:08:14.719+0000: 8470: debug : virDomainDefine
<name>
[. . .]
2014-06-30 09:08:14.764+0000: 8470: info : qemuDomainDefin
----