Change ID I128bf6b939 (libvirt: handle code=38 + sigkill (ebusy) in
_destroy()) handled the case where a QEMU process "refuses to die" within
a given timeout period set by libvirt.
Originally, libvirt sent SIGTERM (allowing the process to clean-up
resources), then waited 10 seconds, if the guest didn't go away. Then
it sent, the more lethal, SIGKILL and waited another 5 seconds for it to
take effect.
From libvirt v4.7.0 onwards, libvirt increased[1][2] the time it waits
for a guest hard shutdown to complete. It now waits for 30 seconds for
SIGKILL to work (instead of 5). Also, additional wait time is added if
there are assigned PCI devices, as some of those tend to slow things
down.
In this change:
- Increment the counter to retry the _destroy() call from 3 to 6, thus
increasing the total time from 15 to 30 seconds, before SIGKILL
takes effect. And it matches the (more graceful) behaviour of
libvirt v4.7.0. This also gives breathing room for Nova instances
running in environments with large compute nodes with high instance
creation or delete churn, where the current timout may not be
sufficient.
- Retry the _destroy() API call _only_ if MIN_LIBVIRT_VERSION is lower
than 4.7.0.
Reviewed: https:/ /review. opendev. org/667389 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=118cf0c5920 b5ff41333fd304a 934dd7e5f4e1a8
Committed: https:/
Submitter: Zuul
Branch: stable/stein
commit 118cf0c5920b5ff 41333fd304a934d d7e5f4e1a8
Author: Kashyap Chamarthy <email address hidden>
Date: Mon Feb 25 13:26:24 2019 +0100
libvirt: Rework 'EBUSY' (SIGKILL) error handling code path
Change ID I128bf6b939 (libvirt: handle code=38 + sigkill (ebusy) in
_destroy()) handled the case where a QEMU process "refuses to die" within
a given timeout period set by libvirt.
Originally, libvirt sent SIGTERM (allowing the process to clean-up
resources), then waited 10 seconds, if the guest didn't go away. Then
it sent, the more lethal, SIGKILL and waited another 5 seconds for it to
take effect.
From libvirt v4.7.0 onwards, libvirt increased[1][2] the time it waits
for a guest hard shutdown to complete. It now waits for 30 seconds for
SIGKILL to work (instead of 5). Also, additional wait time is added if
there are assigned PCI devices, as some of those tend to slow things
down.
In this change:
- Increment the counter to retry the _destroy() call from 3 to 6, thus
increasing the total time from 15 to 30 seconds, before SIGKILL
takes effect. And it matches the (more graceful) behaviour of
libvirt v4.7.0. This also gives breathing room for Nova instances
running in environments with large compute nodes with high instance
creation or delete churn, where the current timout may not be
sufficient.
- Retry the _destroy() API call _only_ if MIN_LIBVIRT_VERSION is lower
than 4.7.0.
[1] https:/ /libvirt. org/git/ ?p=libvirt. git;a=commitdif f;h=9a4e4b9 /libvirt. org/git/ ?p=libvirt. git;a=commit; h=be2ca04 ("process:
(process: wait longer 5->30s on hard shutdown)
[2] https:/
wait longer on kill per assigned Hostdev")
Related-bug: #1353939
Change-Id: If2035cac931c42 c440d61ba97ebc7 e9e92141a28 aeae84cb9bd5d18 895948af54)
Signed-off-by: Kashyap Chamarthy <email address hidden>
(cherry picked from commit 10d50ca4e210039