Comment 3 for bug 1622813

Revision history for this message
Andrew Wilkins (axwalk) wrote :

OK, I *think* I see what's going on now, finally.

The state lifecycle watcher is sending a change when it sees the first machine become Dead. The API client is ready for it, and pulls it over immediately. The provisioner is ready for that, and pulls that down immediately.

The state lifecycle watcher then notices the second machine become Dead, and sends that across to the API client, which pulls it into memory. The provisioner is busy destroying the first instance, so doesn't grab it yet.

At this point, the state lifcycle watcher will gradually see that each of the remaining machines is Dead, and coalesces their IDs. So it's not until the *third* call that they all get destroyed.

In the immediate term, I think we should update api/watcher to coalesce. Long term, I think we want to change the provisioner to permit multiple concurrent operations. If a provider can't terminate multiple instances concurrently, then it needs to serialise those operations. But we shouldn't hamstring them all.