Cleaning fails if conductor thread can not be started

Bug #1635619 reported by Yuriy Zveryanskyy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Triaged
Medium
Unassigned

Bug Description

We ignore exception.NoFreeConductorWorker in the agent driver and waiting for the next heartbeat, but 'resume' event already processed in the continue_node_clean() conductor's method before spawning a thread. As result cleaning fails even if next heartbeat can execute thread via conductor.

summary: - Node stuck in the CLEANING state if worker thread can not be executed on
- heartbeat
+ Cleaning fails if conductor thread can not be started
description: updated
Joanna Taryma (jtaryma)
Changed in ironic:
assignee: nobody → Joanna Taryma (jtaryma)
Revision history for this message
Joanna Taryma (jtaryma) wrote :

Pass for exception.NoFreeConductorWorker doesn't change anything, because it is never raised in agent driver.
NoFreeConductorWorker is raised in conductor's continue_node_clean, which is called via RPC cast. That means, even if continue_node_clean raises any exception, it won't get propagated to agent driver due to async communication.

I think exception handling code should be removed as it's misleading.

Going forward - currently there is no mean for the agent to know about failure in continue_node_clean. Similar problems may be found in other functions that communicate via RPC. Either all of that should be handled on conductor side, or we have to think about the way to inform about failures

Changed in ironic:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Dmitry Tantsur (divius) wrote :

Removing assignee after more than a year of inactivity

Changed in ironic:
assignee: Joanna Taryma (jtaryma) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.