This clearly shows that at 11:05:50, libvirtd sends a request to QEMU 'query-block-jobs'. Approx three minutes libvirtd gets a 'virDomainDestroy' request from Nova, at which point we get the virDomainBlockAbort error due to death of QEMU.
We can see this virDomainDestroy call is triggered by Tempest's timeout
2014-07-09 11:09:04.479 23996 ERROR tempest.api.compute.images.test_list_image_filters [-] setUpClass failed
2014-07-09 11:09:04.479 23996 TRACE tempest.api.compute.images.test_list_image_filters Traceback (most recent call last):
2014-07-09 11:09:04.479 23996 TRACE tempest.api.compute.images.test_list_image_filters File "tempest/api/compute/images/test_list_image_filters.py", line 52, in setUpClass
2014-07-09 11:09:04.479 23996 TRACE tempest.api.compute.images.test_list_image_filters cls.server2['id'], wait_until='ACTIVE')
2014-07-09 11:09:04.479 23996 TRACE tempest.api.compute.images.test_list_image_filters File "tempest/api/compute/base.py", line 328, in create_image_from_server
2014-07-09 11:09:04.479 23996 TRACE tempest.api.compute.images.test_list_image_filters kwargs['wait_until'])
2014-07-09 11:09:04.479 23996 TRACE tempest.api.compute.images.test_list_image_filters File "tempest/services/compute/json/images_client.py", line 92, in wait_for_image_status
2014-07-09 11:09:04.479 23996 TRACE tempest.api.compute.images.test_list_image_filters waiters.wait_for_image_status(self, image_id, status)
2014-07-09 11:09:04.479 23996 TRACE tempest.api.compute.images.test_list_image_filters File "tempest/common/waiters.py", line 143, in wait_for_image_status
2014-07-09 11:09:04.479 23996 TRACE tempest.api.compute.images.test_list_image_filters raise exceptions.TimeoutException(message)
2014-07-09 11:09:04.479 23996 TRACE tempest.api.compute.images.test_list_image_filters TimeoutException: Request timed out
2014-07-09 11:09:04.479 23996 TRACE tempest.api.compute.images.test_list_image_filters Details: (ListImageFiltersTestJSON:setUpClass) Image 8a7d72cc-0715-42d7-894a-83417051f4b9 failed to reach ACTIVE status within the required time (196 s). Current status: SAVING.
2014-07-09 11:09:04.479 23996 TRACE tempest.api.compute.images.test_list_image_filters
In the time where libvirtd is waiting for the QEMU response, there are many other libvirt API calls taking place, so it doesn't look like libvirtd itself is misbehaving. It also successfully sees the EOF on the QEMU monitor socket, so we can see the event loop is still listening for the reply from QEMU.
So I'm still left thinking this is a QEMU bug of some sort.
I have managed to capture a failure with verbose libvirtd.log enabled.
http:// logs.openstack. org/66/ 103066/ 11/check/ check-tempest- dsvm-postgres- full/d8d3b5b/ logs/libvirtd. txt.gz logs.openstack. org/66/ 103066/ 11/check/ check-tempest- dsvm-postgres- full/d8d3b5b/ logs/screen- n-cpu.txt. gz
http://
2014-07-09 11:05:50.701+0000: 21774: debug : virDomainBlockJ obAbort: 19364 : dom=0x7fc73c010700, (VM: name=instance- 0000001d, uuid=6cfca6f5- 8741-4844- a045-a14a644dd8 2d), disk=/opt/ stack/data/ nova/instances/ 6cfca6f5- 8741-4844- a045-a14a644dd8 2d/disk, flags=0 ginJobInternal: 1050 : Starting job: modify (async=none vm=0x7fc740009ee0 name=instance- 0000001d) ginJobInternal: 1092 : Started job: modify (async=none vm=0x7fc740009ee0 name=instance- 0000001d) terMonitorInter nal:1278 : Entering monitor (mon=0x7fc73c010ad0 vm=0x7fc740009ee0 name=instance- 0000001d) kJob:3314 : mon=0x7fc73c010ad0, device= drive-virtio- disk0, base=<null>, bandwidth=0M, info=(nil), mode=0, modern=1 CommandWithFd: 264 : Send command '{"execute" :"block- job-cancel" ,"arguments" :{"device" :"drive- virtio- disk0"} ,"id":" libvirt- 14"}' for write with FD -1 SEND_MSG: mon=0x7fc73c010ad0 msg={"execute" :"block- job-cancel" ,"arguments" :{"device" :"drive- virtio- disk0"} ,"id":" libvirt- 14"} CommandWithFd: 269 : Receive command reply ret=0 rxObject= 0x7fc76096cac0 itMonitorIntern al:1301 : Exited monitor (mon=0x7fc73c010ad0 vm=0x7fc740009ee0 name=instance- 0000001d) terMonitorInter nal:1278 : Entering monitor (mon=0x7fc73c010ad0 vm=0x7fc740009ee0 name=instance- 0000001d) kJob:3314 : mon=0x7fc73c010ad0, device= drive-virtio- disk0, base=<null>, bandwidth=0M, info=0x7fc7572b e9a0, mode=1, modern=1 CommandWithFd: 264 : Send command '{"execute" :"query- block-jobs" ,"id":" libvirt- 15"}' for write with FD -1 SEND_MSG: mon=0x7fc73c010ad0 msg={"execute" :"query- block-jobs" ,"id":" libvirt- 15"} CommandWithFd: 269 : Receive command reply ret=0 rxObject= 0x7fc76099cce0 itMonitorIntern al:1301 : Exited monitor (mon=0x7fc73c010ad0 vm=0x7fc740009ee0 name=instance- 0000001d) terMonitorInter nal:1278 : Entering monitor (mon=0x7fc73c010ad0 vm=0x7fc740009ee0 name=instance- 0000001d) kJob:3314 : mon=0x7fc73c010ad0, device= drive-virtio- disk0, base=<null>, bandwidth=0M, info=0x7fc7572b e9a0, mode=1, modern=1 CommandWithFd: 264 : Send command '{"execute" :"query- block-jobs" ,"id":" libvirt- 16"}' for write with FD -1 SEND_MSG: mon=0x7fc73c010ad0 msg={"execute" :"query- block-jobs" ,"id":" libvirt- 16"} CommandWithFd: 269 : Receive command reply ret=0 rxObject= 0x7fc760999970 itMonitorIntern al:1301 : Exited monitor (mon=0x7fc73c010ad0 vm=0x7fc740009ee0 name=instance- 0000001d) terMonitorInter nal:1278 : Entering monitor (mon=0x7fc73c010ad0 vm=0x7fc740009ee0 name=instance- 0000001d) kJob:3314 : mon=0x7fc73c010ad0, device= drive-virtio- disk0, base=<null>, bandwidth=0M, info=0x7fc7572b e9a0, mode=1, modern=1 CommandWithFd: 264 : Send command '{"execute" :"query- block-jobs" ,"id":" libvirt- 17"}' for write with FD -1 SEND_MSG: mon=0x7fc73c010ad0 msg={"execute" :"query- block-jobs" ,"id":" libvirt- 17"} y:2173 : dom=0x7fc744005930, (VM: name=instance- 0000001d, uuid=6cfca6f5- 8741-4844- a045-a14a644dd8 2d) :4189 : vm=0x7fc740009ee0 name=instance- 0000001d pid=5820 flags=1 ainfully: 274 : vpid=5820 force=1 CommandWithFd: 269 : Receive command reply ret=-1 rxObject=(nil) ginJobInternal: 1119 : Cannot start job (query, none) for domain instance-0000001d; current job is (modify, none) owned by (21774, 0) itMonitorIntern al:1301 : Exited monitor (mon=0x7fc73c010ad0 vm=0x7fc740009ee0 name=instance- 0000001d) dJob:1206 : Stopping job: modify (async=none vm=0x7fc740009ee0 name=instance- 0000001d) 0000001d, uuid=6cfca6f5- 8741-4844- a045-a14a644dd8 2d)
2014-07-09 11:05:50.701+0000: 21774: debug : qemuDomainObjBe
2014-07-09 11:05:50.701+0000: 21774: debug : qemuDomainObjBe
2014-07-09 11:05:50.701+0000: 21774: debug : qemuDomainObjEn
2014-07-09 11:05:50.701+0000: 21774: debug : qemuMonitorBloc
2014-07-09 11:05:50.701+0000: 21774: debug : qemuMonitorJSON
2014-07-09 11:05:50.701+0000: 21774: debug : qemuMonitorSend:959 : QEMU_MONITOR_
2014-07-09 11:05:50.705+0000: 21774: debug : qemuMonitorJSON
2014-07-09 11:05:50.705+0000: 21774: debug : qemuDomainObjEx
2014-07-09 11:05:50.705+0000: 21774: debug : qemuDomainObjEn
2014-07-09 11:05:50.705+0000: 21774: debug : qemuMonitorBloc
2014-07-09 11:05:50.705+0000: 21774: debug : qemuMonitorJSON
2014-07-09 11:05:50.705+0000: 21774: debug : qemuMonitorSend:959 : QEMU_MONITOR_
2014-07-09 11:05:50.709+0000: 21774: debug : qemuMonitorJSON
2014-07-09 11:05:50.709+0000: 21774: debug : qemuDomainObjEx
2014-07-09 11:05:50.759+0000: 21774: debug : qemuDomainObjEn
2014-07-09 11:05:50.759+0000: 21774: debug : qemuMonitorBloc
2014-07-09 11:05:50.759+0000: 21774: debug : qemuMonitorJSON
2014-07-09 11:05:50.759+0000: 21774: debug : qemuMonitorSend:959 : QEMU_MONITOR_
2014-07-09 11:05:50.762+0000: 21774: debug : qemuMonitorJSON
2014-07-09 11:05:50.762+0000: 21774: debug : qemuDomainObjEx
2014-07-09 11:05:50.812+0000: 21774: debug : qemuDomainObjEn
2014-07-09 11:05:50.812+0000: 21774: debug : qemuMonitorBloc
2014-07-09 11:05:50.812+0000: 21774: debug : qemuMonitorJSON
2014-07-09 11:05:50.812+0000: 21774: debug : qemuMonitorSend:959 : QEMU_MONITOR_
2014-07-09 11:09:07.169+0000: 21772: debug : virDomainDestro
2014-07-09 11:09:07.169+0000: 21772: debug : qemuProcessKill
2014-07-09 11:09:07.169+0000: 21772: debug : virProcessKillP
2014-07-09 11:09:17.184+0000: 21774: debug : qemuMonitorSend:971 : Send command resulted in error Unable to read from monitor: Connection reset by peer
2014-07-09 11:09:17.184+0000: 21774: debug : qemuMonitorJSON
2014-07-09 11:09:17.393+0000: 21773: warning : qemuDomainObjBe
2014-07-09 11:09:17.395+0000: 21774: debug : qemuDomainObjEx
2014-07-09 11:09:17.395+0000: 21774: debug : qemuDomainObjEn
2014-07-09 11:09:17.395+0000: 21774: debug : virDomainFree:2270 : dom=0x7fc73c010700, (VM: name=instance-
This clearly shows that at 11:05:50, libvirtd sends a request to QEMU 'query-block-jobs'. Approx three minutes libvirtd gets a 'virDomainDestroy' request from Nova, at which point we get the virDomainBlockAbort error due to death of QEMU.
We can see this virDomainDestroy call is triggered by Tempest's timeout
2014-07-09 11:09:04.479 23996 ERROR tempest. api.compute. images. test_list_ image_filters [-] setUpClass failed api.compute. images. test_list_ image_filters Traceback (most recent call last): api.compute. images. test_list_ image_filters File "tempest/ api/compute/ images/ test_list_ image_filters. py", line 52, in setUpClass api.compute. images. test_list_ image_filters cls.server2['id'], wait_until= 'ACTIVE' ) api.compute. images. test_list_ image_filters File "tempest/ api/compute/ base.py" , line 328, in create_ image_from_ server api.compute. images. test_list_ image_filters kwargs[ 'wait_until' ]) api.compute. images. test_list_ image_filters File "tempest/ services/ compute/ json/images_ client. py", line 92, in wait_for_ image_status api.compute. images. test_list_ image_filters waiters. wait_for_ image_status( self, image_id, status) api.compute. images. test_list_ image_filters File "tempest/ common/ waiters. py", line 143, in wait_for_ image_status api.compute. images. test_list_ image_filters raise exceptions. TimeoutExceptio n(message) api.compute. images. test_list_ image_filters TimeoutException: Request timed out api.compute. images. test_list_ image_filters Details: (ListImageFilte rsTestJSON: setUpClass) Image 8a7d72cc- 0715-42d7- 894a-83417051f4 b9 failed to reach ACTIVE status within the required time (196 s). Current status: SAVING. api.compute. images. test_list_ image_filters
2014-07-09 11:09:04.479 23996 TRACE tempest.
2014-07-09 11:09:04.479 23996 TRACE tempest.
2014-07-09 11:09:04.479 23996 TRACE tempest.
2014-07-09 11:09:04.479 23996 TRACE tempest.
2014-07-09 11:09:04.479 23996 TRACE tempest.
2014-07-09 11:09:04.479 23996 TRACE tempest.
2014-07-09 11:09:04.479 23996 TRACE tempest.
2014-07-09 11:09:04.479 23996 TRACE tempest.
2014-07-09 11:09:04.479 23996 TRACE tempest.
2014-07-09 11:09:04.479 23996 TRACE tempest.
2014-07-09 11:09:04.479 23996 TRACE tempest.
2014-07-09 11:09:04.479 23996 TRACE tempest.
In the time where libvirtd is waiting for the QEMU response, there are many other libvirt API calls taking place, so it doesn't look like libvirtd itself is misbehaving. It also successfully sees the EOF on the QEMU monitor socket, so we can see the event loop is still listening for the reply from QEMU.
So I'm still left thinking this is a QEMU bug of some sort.