Ironic

Overview
Code
Bugs
Blueprints
Translations
Answers

Bug #1326364
Comment #13

Comment 13 for bug 1326364

Revision history for this message

Adam Gandelman (gandelman-a) wrote on 2014-06-05:

#13

fail.log Edit (15.5 KiB, application/octet-stream)

Been hitting this a lot and poking at it in devstack today. It seems to be more than simply nodes becoming disassociated with instances on failure, and smells more like something deeper in the scheduler. I'm curious what changes that broke this, as it was working okay previously.

Its easy to reproduce in devstack, simply enroll multiple VMs (IRONIC_VM_COUNT) and set deploy_callback_timeout to something low. You'll notice that, after the first failure, the instance get rescheduled to multiple nodes, and multiple other nodes after the second failure. I've attached a client-side log showing the transitions.