Comment 6 for bug 1389178

Revision history for this message
Ladislav Smola (lsmola) wrote : Re: heat stack-update failure when scaling resource group

OK so I think I found the reason

there is a race condition in signaling/resource creation

when we run heat stack-update, there are two ComputeAllNodesDeployment events in quick sequence
| ComputeAllNodesDeployment | 2607455f-2f34-4275-95dd-205e5ba3cd1b | state changed | CREATE_IN_PROGRESS | 2014-11-11T09:27:46Z |
| ComputeAllNodesDeployment | aea956f1-aa3a-4ce3-8ae4-7a4c841333f2 | state changed | UPDATE_IN_PROGRESS | 2014-11-11T09:27:45Z |

But the CREATE_IN_PROGRESS reference to new ComputeAllNodesDeployment resource.

Now it can happen if config starts too quickly, that the signal will be sent to old resource by this https://github.com/openstack/tripleo-image-elements/blob/master/elements/os-refresh-config/os-refresh-config/post-configure.d/99-refresh-completed#L5 (the signal is referencing the old ComputeAllNodesDeployment)

Then the new ComputeAllNodesDeployment waits also for the signal, but it never gets it and timeouts.