Comment 3 for bug 1998148

Revision history for this message
Jorge San Emeterio (jsanemet) wrote :

I have been tweaking around with this bug for some time now and it is a tricky one. I have not been able to determine exactly why it happens, but there are a few pointers I have figured out:

*.- All the below have been tested on Ubuntu 22.04.2 LTS.

1.- It must be something related to devstack's configuration. Deploying devstack with a default local.conf does not reproduce the bug. A devstack with a local.conf equal or very similar to the one deployed by zuul is required for this happen.

2.- It seems like the image used on the test matters. Running the test with cirros 0.5.2 always results in the timeout happening. However, I have also tried it with alpine instead and, with that, the timeout happened once every three tests or so.

3.- On any case, the timeout happens because the order to detach a volume from a server is issued to qemu and this one takes forever to perform that action, much more that what Nova is willing to wait. Eventually, if you let it be long enough, the volume does get detached.

4.- I have not found any errors happening on libvirt or qemu. The request is issued and the operation goes on as it should, although it takes a long time.

5.- Even though nova eventually gives up waiting and returns to consider the volume 'in-use'. The operation is still ongoing on the background and can result on a mismatch between libvirt and nova.

Maybe someone with more insight on the inner workings of libvirt could lend a hand to determine why this slow down occurs.