baremetal PXE timeout interrupts active deploys

Bug #1208638 reported by Robert Collins
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ironic
Invalid
Medium
Yuriy Zveryanskyy
OpenStack Compute (nova)
Won't Fix
Medium
Unassigned

Bug Description

When the DD of an image takes an unexpectedly long time (e.g. due to network congestion), the PXE deploy timeout may interrupt the deploy by powering off the node, which then causes it to be rescheduled and exacerbates the problem.

If we monitor dd and check it is making progress, we could use this as a heartbeat to prevent inappropriate interrupts - and have the timeout look for a period of no progress (vs just absolute time).

Tags: baremetal
Revision history for this message
Robert Collins (lifeless) wrote :

Ironic will inherit this issue with the PXE driver, so adding task there.

Changed in ironic:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/48198

Changed in ironic:
assignee: nobody → Yuriy Zveryanskyy (yzveryanskyy)
status: Triaged → In Progress
Revision history for this message
aeva black (tenbrae) wrote :

This doesn't affect Ironic as we don't (currently) have a timeout mechanism for deploys (which is another issue unto itself) and our state tracking is different than Nova's, so once we add operation-timeouts at a higher level, it'll be accounted for.

Changed in ironic:
status: In Progress → Invalid
Revision history for this message
Joe Gordon (jogo) wrote :

nova baremetal is dead

Changed in nova:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.