Ironic VIF unplug in destroy may fail due to node locked by cleaning
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
In Progress
|
Undecided
|
Unassigned |
Bug Description
This bug describes a race condition during deletion of a bare metal (Ironic) instance when automated cleaning is enabled in Ironic (which is the default).
# Steps to reproduce
As a race condition, this one is not easy to reproduce, although it has been seen in the wild on several occasions. The easiest way to reproduce it is in a development environment (e.g. devstack), tweaking some configuration and adding some sleeps to particular operations in nova and ironic.
nova.conf configuration: Reduce retries for ironic API polling. Needs to be long enough to allow operations to complete.
[ironic]
api_max_retries = 10
sleep #1: nova-compute waits for the node to move to cleaning (nova/virt/
try:
if node.provision_
else:
# NOTE(hshiina): if spawn() fails before ironic starts
# provisioning, instance information should be
# removed from ironic node.
finally:
##### HERE #######
##### HERE #######
# NOTE(mgoddard): We don't need to remove instance info at this
# point since we will have already done it. The destroy will only
# succeed if this method returns without error, so we will end up
# removing the instance info eventually.
sleep #2: ironic conductor holds onto the node lock:
@task_manager.
def do_node_clean(task, clean_steps=None, disable_
"""Internal RPC method to perform cleaning of a node.
:param task: a TaskManager instance with an exclusive lock on its node
:param clean_steps: For a manual clean, the list of clean steps to
:param disable_ramdisk: Whether to skip booting ramdisk for cleaning.
"""
##### HERE #####
import time
time.sleep(120)
##### HERE #####
node = task.node
manual_clean = clean_steps is not None
Next, create, then destroy a bare metal instance.
# Expected results
The node is deleted
# Actual results
The node moves to an ERROR state.
Logs from nova-compute:
2023-05-17 14:30:30.493 7 ERROR ironicclient.
onic server: Node 2666c347-
lient.exception
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
essage handling: ironicclient.
d. (HTTP 409)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.
# Analysis
This issue is similar to https:/
2023-05-17 14:24:11.791 7 DEBUG nova.virt.
d3-a46d-
[1] https:/
Fix proposed to branch: master /review. opendev. org/c/openstack /nova/+ /883411
Review: https:/