Comment 8 for bug 1416132

Revision history for this message
Nikola Đipanov (ndipanov) wrote :

Ok so after a lengthy investigation - I think the root cause of this is that we call libvirt.driver._get_instance_disk_info (from _get_disk_over_committed_size_total see https://github.com/openstack/nova/blob/b0854ba0c697243aa3d91170d1a22896aed60e02/nova/virt/libvirt/driver.py#L6511). This method gets called without the block device information, but it is obvious that it cannot possibly do it's job correctly without the block device mapping information as it relies on it to filter out the volumes (see https://github.com/openstack/nova/blob/b0854ba0c697243aa3d91170d1a22896aed60e02/nova/virt/libvirt/driver.py#L6421).

There are 2 ways to fix this bug IMHO neither of which are easy. First one is that we pass the block device info to the libvirt driver on every get_available_resources pass. This will require some invasive refactoring of the resource tracker. We will also want to avoid looping over every instance and firing of N queries for block devices, so some refactoring of the instance related DB layer methods will be needed so that we can join-load block device in a single query.

Another way to approach the bug is to decide if disk_available_least is something we actually care about. It does seem to be used in the disk filter, but it seems to be wildely incorrect for at least some deployment scenarios (RBD for example) but also very useful when using qcow backing files.