OpenStack Compute (nova)

Bug #1530275
Comment #16

Comment 16 for bug 1530275

Revision history for this message

Kashyap Chamarthy (kashyapc) wrote on 2016-09-01:

#16

The log attached in comment #15 is the failed case. And, as I guessed
in comment #12, if you look at the log file from comment #15 the
'block-job-cancel' command is issued too soon, before QEMU reports the
block device job status as "ready" true.

So, with request id "libvirt-14", when libvirt queries QEMU for the
block device job status:

  2016-09-01 11:11:43.053+0000: 29193: info : qemuMonitorIOWrite:528 :
  QEMU_MONITOR_IO_WRITE: mon=0x7f98c4000e00
  buf={"execute":"query-block-jobs","id":"libvirt-14"} len=50 ret=50
  errno=11

It reports that it is not ready yet (see -- "ready": false):

  2016-09-01 11:11:43.057+0000: 29193: info : qemuMonitorIOProcess:423 :
  QEMU_MONITOR_IO_PROCESS: mon=0x7f98c4000e00 buf={"return":
  [{"io-status": "ok", "device": "drive-virtio-disk0", "busy": true,
  "len": 0, "offset": 0, "paused": false, "speed": 0, "ready": false,
  "type": "mirror"}], "id": "libvirt-14"} len=187

Instead of waiting for the block device job status to turn to "ready":
true, Nova goes ahead and aborts the job (with 'block-job-cancel'), with
id "libvirt-15":

  2016-09-01 11:11:43.058+0000: 29193: info : qemuMonitorIOWrite:528 :
  QEMU_MONITOR_IO_WRITE: mon=0x7f98c4000e00
  buf={"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-15"}

So, this (aborting the job before it's ready) is causing the snapshot to
be corrupted. If Nova waits a little longer, then the snapshot is
successful (as it can be seen from my log analysis in comment #13, which
is the good case).