Comment 0 for bug 1788226

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I had an upstream discussion at [1]

The TL;DR is that if using more than a few hostdevices as pass through then on shutting down of the guest there might be an odd time.

In that time the qemu process is already gone from /proc/<pid> but still reachable via singal-0.

That makes libvirt believe the process would not die (think of zombie processes due to e.g. failed NFS files).

But what really happens is that depending on the hostdev the kernel might need up to 1-2 seconds extra time to unallocate all the pci ressources.

We came up with patches that scale the allowed time depending on the number of hostdevs as well a s a general bump for the bad case (sigkill) following the recommendation of kernel engineers who said that we are in a bad path anyway, so we could as well provide a bit more time to let it clean up.

We should add these changes at least back into Bionic IMHO.

[1]: https://www.redhat.com/archives/libvir-list/2018-August/msg01295.html