Comment 13 for bug 1326364

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Been hitting this a lot and poking at it in devstack today. It seems to be more than simply nodes becoming disassociated with instances on failure, and smells more like something deeper in the scheduler. I'm curious what changes that broke this, as it was working okay previously.

Its easy to reproduce in devstack, simply enroll multiple VMs (IRONIC_VM_COUNT) and set deploy_callback_timeout to something low. You'll notice that, after the first failure, the instance get rescheduled to multiple nodes, and multiple other nodes after the second failure. I've attached a client-side log showing the transitions.