cpu power management can fail with OSError: [Errno 16] Device or resource busy

Bug #2065927 reported by sean mooney
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Low
sean mooney
2024.1
Triaged
Low
Unassigned
Antelope
Triaged
Low
Unassigned
Bobcat
Triaged
Low
Unassigned

Bug Description

as reported downstream in https://issues.redhat.com/browse/OSPRH-7103

if you create a vm, reboot the host, start the vm,
and finally delete it.

that may fail

May 16 15:54:26 edpm-compute-0 nova_compute[3396]: Traceback (most recent call last):
May 16 15:54:26 edpm-compute-0 nova_compute[3396]: File "/usr/lib/python3.9/site-packages/nova/filesystem.py", line 57, in write_sys
May 16 15:54:26 edpm-compute-0 nova_compute[3396]: fd.write(data)
May 16 15:54:26 edpm-compute-0 nova_compute[3396]: OSError: [Errno 16] Device or resource busy

this prevents the VM from being deleted on the inial request but it can then be deleted if you try again

this race condition with the kernel is unlikely to happen and appeared to be timing related.

i.e. there is a short period of time where onlineing or offlining of a CPU may not be possible

to mitigation this nova should retry the operation with a backoff and then eventually squash the error allowing the vm to delete without failing if we cant offline the core.

power management of the core should never block or cause the vm delete to fail.

Tags: libvirt
Changed in nova:
assignee: nobody → sean mooney (sean-k-mooney)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/920119

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/920203

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/920119
Committed: https://opendev.org/openstack/nova/commit/ee581a5c9d1c0b7c0d8830a08f55fe8bc2fbcd0f
Submitter: "Zuul (22348)"
Branch: master

commit ee581a5c9d1c0b7c0d8830a08f55fe8bc2fbcd0f
Author: Sean Mooney <email address hidden>
Date: Tue May 21 17:53:07 2024 +0100

    add functional repoducer for bug 2065927

    Today if the write sys call to offline a cpu when
    deleting an instnace fails due to an OSERROR or ValueERROR
    the instance delete fails and the instance goes to error.

    as reported in bug: #2065927 this can happen as a result of
    OSError: [Errno 16] Device or resource busy if the vm is
    deleted shortly after its started.

    Related-Bug: #2065927
    Change-Id: I1352a3a1e28cfe14ec8f32042ed35cb25e70338e

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.