Baremetal recreation may fool the heat stack.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Expired
|
Medium
|
Unassigned |
Bug Description
We've seen a problem (recreatable in the lab here) related to baremetal use of the devtest scripts.
The situation is this: a running (virtualised) seed booting a (single-node) undercloud on real tin, which in turn boots a (multiple-node) overcloud.
Running "devtest.sh --trash-my-machine -c" causes the heat stack to get confused.
(We think that the reason why "-c" is critical is that the devtest script moves onto the heat stack-create step quickly; without this, the intervening DIB step will mean that this bug doesn't get triggered.)
What appears to happen is this: the running undercloud has a running o-c-c which is polling for metadata. It gets refreshed metadata from heat as the stack-create happens. o-c-c runs to completion on the node and posts success to its wait condition in the heat stack on the seed. All this happens before the seed has a chance to reboot and refresh the node.
It's not clear if this is fundamentally down to nova baremetal populating the metadata for the new instance too early (perhaps it should wait until the image loader is able to start imaging the node?)
Changed in tripleo: | |
status: | Confirmed → Triaged |
importance: | Undecided → Medium |
(Whether the timing of the "-c" flag actually does mean this doesn't happen, or we just got lucky, I don't know.)
This obviously doesn't show if you heat stack-delete between runs; however, in the case of partially-working underclouds (or overclouds), if the stack-delete wedges for some reason - or the seed is unable to complete that for other reasons - the running baremetal node state may still compromise this process.