Hi! Kashyap asked me to look into this from Nova perspective. So I did so. I started with the original run that is linked in the report https://logserver.rdoproject.org/60/31460/51/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-victoria/6a262b2/ There I see the following: # A server was created with the domain xml: 2022-01-25 13:13:22.963 8 DEBUG nova.virt.libvirt.driver [req-3f14a5ed-531f-4ff7-99ff-5b6197428175 c0aeae9dc16743558ee7e13bf4f432d9 3c00e69e1dc4414faa97b980e348e301 - default default] [instance: f54fd0a5-322a-4cb1-8bb8-604aecc2c4d1] End _get_guest_xml xml= f54fd0a5-322a-4cb1-8bb8-604aecc2c4d1 instance-0000003e 131072 1 tempest-ListImageFiltersTestJSON-server-1201316046 2022-01-25 13:13:22 128 1 0 0 1 tempest-ListImageFiltersTestJSON-6230924-project tempest-ListImageFiltersTestJSON-6230924 RDO OpenStack Compute 22.3.1-0.20220111022305.28d0059.el8 f54fd0a5-322a-4cb1-8bb8-604aecc2c4d1 f54fd0a5-322a-4cb1-8bb8-604aecc2c4d1 Virtual Machine hvm 1024 /dev/urandom # A snapshot is requested while the server is running nova-compute.log.1:2022-01-25 13:13:33.305 8 INFO nova.compute.manager [req-a487eeb2-b7f9-4712-8a58-6cfcdf9836b4 c0aeae9dc16743558ee7e13bf4f432d9 3c00e69e1dc4414faa97b980e348e301 - default default] [instance: f54fd0a5-322a-4cb1-8bb8-604aecc2c4d1] instance snapshotting nova-compute.log.1:2022-01-25 13:13:33.445 8 INFO nova.virt.libvirt.driver [req-a487eeb2-b7f9-4712-8a58-6cfcdf9836b4 c0aeae9dc16743558ee7e13bf4f432d9 3c00e69e1dc4414faa97b980e348e301 - default default] [instance: f54fd0a5-322a-4cb1-8bb8-604aecc2c4d1] Beginning live snapshot process # The snapshot failed in libvirt therefore the image with the content of the snapshot is never created in glance nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server [req-a487eeb2-b7f9-4712-8a58-6cfcdf9836b4 c0aeae9dc16743558ee7e13bf4f432d9 3c00e69e1dc4414faa97b980e348e301 - default default] Exception during message handling: libvirt.libvirtError: Unable to read from monitor: Connection reset by peer nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 2768, in _live_snapshot nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server while not dev.is_job_complete(): nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 864, in is_job_complete nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server status = self.get_job_info() nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 771, in get_job_info nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server status = self._guest._domain.blockJobInfo(self._disk, flags=0) nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server result = proxy_call(self._autowrap, f, *args, **kwargs) nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server rv = execute(f, *args, **kwargs) nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server six.reraise(c, e, tb) nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server raise value nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server rv = meth(*args, **kwargs) nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server File "/usr/lib64/python3.6/site-packages/libvirt.py", line 999, in blockJobInfo nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server raise libvirtError('virDomainGetBlockJobInfo() failed') nova-compute.log.1:2022-01-25 13:13:35.560 8 ERROR oslo_messaging.rpc.server libvirt.libvirtError: Unable to read from monitor: Connection reset by peer nova-compute.log.1:2022-01-25 13:13:35.557 8 INFO nova.compute.manager [req-a487eeb2-b7f9-4712-8a58-6cfcdf9836b4 c0aeae9dc16743558ee7e13bf4f432d9 3c00e69e1dc4414faa97b980e348e301 - default default] [instance: f54fd0a5-322a-4cb1-8bb8-604aecc2c4d1] Successfully reverted task state from image_pending_upload on failure for instance. # The libvirtd log is pretty small. This is around the time window of the nova failure: 2022-01-25 13:13:33.727+0000: 31297: error : qemuDomainBlockJobAbort:14523 : invalid argument: disk vda does not have an active block job 2022-01-25 13:13:34.856+0000: 52554: error : qemuMonitorIORead:495 : Unable to read from monitor: Connection reset by peer 2022-01-25 13:13:35.083+0000: 47753: warning : virSecuritySELinuxRestoreFileLabel:1454 : cannot resolve symlink /var/lib/nova/instances/snapshots/tmpbe0_42ae/72c39473b96a45bb8eeb51f96167fb79.delta: No such file or directory 2022-01-25 13:13:35.083+0000: 47753: warning : qemuProcessStop:8211 : Unable to restore security label on vda # The QEMU log of the server only states 2022-01-25 13:13:35.058+0000: shutting down, reason=crashed # The tempest failed as it did not find the image File "/usr/lib/python3.6/site-packages/tempest/api/compute/images/test_list_image_filters.py", line 124, in resource_setup cls.server2['id'], wait_until='ACTIVE') File "/usr/lib/python3.6/site-packages/tempest/api/compute/base.py", line 387, in create_image_from_server image_id=image_id) tempest.exceptions.SnapshotNotFoundException: Server snapshot image 926320d2-d53a-4aff-a679-de0645fd33e1 not found. ----- Then I looked at the newer logs from the comments above but none of them showing the same nova failure. So I don't think they are direct reproductions of the original problem. I've checked * https://logserver.rdoproject.org/83/38983/1/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-victoria/b21235c/logs from comment-11 * https://logserver.rdoproject.org/83/38983/2/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-victoria/67d64d4/logs from comment-14 Also I tried to find a similar nova error raise libvirtError('virDomainGetBlockJobInfo() failed') in upstream job runs but in the last 10 days I don't found any hits.