Comment 4 for bug 1662867

Revision history for this message
Matt Riedemann (mriedem) wrote :

We still see this in CI runs periodically:

http://logs.openstack.org/70/552170/2/check/legacy-tempest-dsvm-multinode-live-migration/474e6b5/logs/screen-n-cpu.txt.gz?level=INFO#_Mar_13_22_24_03_425580

We can see that in this case, the instance in question has it's files deleted right before the libvirt driver, via the update_available_resource periodic task, gets to processing that instance:

Mar 13 22:24:03.367217 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: INFO nova.virt.libvirt.driver [None req-d0181533-0010-4268-bea5-fda2a392e8f1 tempest-LiveMigrationRemoteConsolesV26Test-2100428245 tempest-LiveMigrationRemoteConsolesV26Test-2100428245] [instance: 0460cf87-16f1-4aa3-a964-7cab159327dc] Deleting instance files /opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc_del
Mar 13 22:24:03.367725 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: INFO nova.virt.libvirt.driver [None req-d0181533-0010-4268-bea5-fda2a392e8f1 tempest-LiveMigrationRemoteConsolesV26Test-2100428245 tempest-LiveMigrationRemoteConsolesV26Test-2100428245] [instance: 0460cf87-16f1-4aa3-a964-7cab159327dc] Deletion of /opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc_del complete
Mar 13 22:24:03.425580 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager [None req-d0181533-0010-4268-bea5-fda2a392e8f1 tempest-LiveMigrationRemoteConsolesV26Test-2100428245 tempest-LiveMigrationRemoteConsolesV26Test-2100428245] Error updating resources for node ubuntu-xenial-rax-iad-0002937850.: InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk : Unexpected error while running command.
Mar 13 22:24:03.425969 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk --force-share
Mar 13 22:24:03.426340 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: Exit code: 1
Mar 13 22:24:03.426705 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: Stdout: u''
Mar 13 22:24:03.427044 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: Stderr: u"qemu-img: Could not open '/opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk': Could not open '/opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk': No such file or directory\n"
Mar 13 22:24:03.427380 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager Traceback (most recent call last):
Mar 13 22:24:03.427724 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/manager.py", line 7320, in update_available_resource_for_node
Mar 13 22:24:03.428084 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager rt.update_available_resource(context, nodename)
Mar 13 22:24:03.428410 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 664, in update_available_resource
Mar 13 22:24:03.428653 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename)
Mar 13 22:24:03.428898 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 6432, in get_available_resource
Mar 13 22:24:03.429219 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager disk_over_committed = self._get_disk_over_committed_size_total()
Mar 13 22:24:03.429576 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 7977, in _get_disk_over_committed_size_total
Mar 13 22:24:03.429850 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager config, block_device_info)
Mar 13 22:24:03.430176 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 7896, in _get_instance_disk_info_from_config
Mar 13 22:24:03.430498 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager virt_size = disk_api.get_disk_size(path)
Mar 13 22:24:03.430768 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/virt/disk/api.py", line 140, in get_disk_size
Mar 13 22:24:03.431081 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager return images.qemu_img_info(path).virtual_size
Mar 13 22:24:03.431316 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/virt/images.py", line 82, in qemu_img_info
Mar 13 22:24:03.431546 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager raise exception.InvalidDiskInfo(reason=msg)
Mar 13 22:24:03.431785 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk : Unexpected error while running command.
Mar 13 22:24:03.432017 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk --force-share
Mar 13 22:24:03.432266 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager Exit code: 1
Mar 13 22:24:03.432498 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager Stdout: u''
Mar 13 22:24:03.432742 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager Stderr: u"qemu-img: Could not open '/opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk': Could not open '/opt/stack/data/nova/instances/0460cf87-16f1-4aa3-a964-7cab159327dc/disk': No such file or directory\n"
Mar 13 22:24:03.432981 ubuntu-xenial-rax-iad-0002937850 nova-compute[30634]: ERROR nova.compute.manager

We should likely just catch the ProcessExecutionError, make a best effort attempt at checking the return code (1) and error message for "No such file or directory" and if that's the case, skip that instance and don't fail the entire process of building the current set of resources for the host.

I thought about adding the deleted=False filter to this query:

https://github.com/openstack/nova/blob/96fdce7cab7c50736fb96bcaa622dac825031a2f/nova/virt/libvirt/driver.py#L7932

But at this point, libvirt has already told us the domain still exists on the host so filtering by deleted=False doesn't really help here since the instance isn't deleted yet, it's still on the host according to the hypervisor, but it's in the process of being deleted and the files are already cleaned up by the time we hit the error.