Heat::WaitCondition resource hanging in the CREATE_IN_PROGRESS status

Bug #1737651 reported by Shi Yan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Invalid
Undecided
Unassigned

Bug Description

I am using the latest pike version heat. But when creating a stack with multiple servers, sometimes (not always reproducible) the stack creation will be stuck in the progress status because of the OS::Heat::WaitCondition resource is hanging in progress.

The problem is the instance is actually CREATE_COMPLETE and its corresponding OS::Heat::WaitConditionHandle resource is also in CREATE_COMPLETE, but WaitCondition doesn't get the status update.

I didn't find the suspect points in the logs, but below logs will repeat for the all time.

2017-12-12 12:15:58.312 23029 DEBUG heat.engine.scheduler [req-cf1adf2a-95c1-4d71-9e3c-ac80471d42ed - pt-914 - default default] Task create from HeatWaitCondition "worker-5326d53d-wc-waiter" Stack "SCENATEST-9f359dc7fd7b8669-worker-5326d53d-wqhr5y5zxgi4-0-irw6e7fj7iip" [690808ee-910e-4b27-8e99-e56078d4fce7] running step /usr/lib/python2.7/dist-packages/heat/engine/scheduler.py:214
2017-12-12 12:15:58.333 23029 DEBUG heat.engine.scheduler [req-cf1adf2a-95c1-4d71-9e3c-ac80471d42ed - pt-914 - default default] Task create from HeatWaitCondition "worker-5326d53d-wc-waiter" Stack "SCENATEST-9f359dc7fd7b8669-worker-5326d53d-wqhr5y5zxgi4-0-irw6e7fj7iip" [690808ee-910e-4b27-8e99-e56078d4fce7] sleeping _sleep /usr/lib/python2.7/dist-packages/heat/engine/scheduler.py:155
2017-12-12 12:15:58.561 23029 DEBUG heat.engine.scheduler [req-9e2f0a86-1f1d-4fdb-8eb9-21abb391b9af - - - - -] Task create from TemplateResource "0" Stack "SCENATEST-9f359dc7fd7b8669-worker-5326d53d-wqhr5y5zxgi4" [b4b2c8c7-7d56-47c3-b644-dcc7aede08c1] running step /usr/lib/python2.7/dist-packages/heat/engine/scheduler.py:214
2017-12-12 12:15:58.571 23029 DEBUG heat.engine.scheduler [req-9e2f0a86-1f1d-4fdb-8eb9-21abb391b9af - - - - -] Task create from TemplateResource "0" Stack "SCENATEST-9f359dc7fd7b8669-worker-5326d53d-wqhr5y5zxgi4" [b4b2c8c7-7d56-47c3-b644-dcc7aede08c1] sleeping _sleep /usr/lib/python2.7/dist-packages/heat/engine/scheduler.py:155

Shi Yan (yanshi-403)
description: updated
Revision history for this message
Rabi Mishra (rabi) wrote :

Looks like your WaitConditionHandle resource is not signaled back for WaitCondition to reach CREATE_COMPLETE state. You probably have to share your template and also check if you can reach heat api service from the server.

Changed in heat:
status: New → Incomplete
Revision history for this message
Shi Yan (yanshi-403) wrote :

Thanks Rabi for your reply.

Actually, I am running the Sahara cluster which triggers the stack creation for heat. The heat api is all right. When I create the stack, it will create two nested stacks, one is master the other is worker. Sometimes, one of them will be working fine, and the other will hang in the "in progress" status due to no signal for WaitCondition.

I attach the template.

Revision history for this message
Shi Yan (yanshi-403) wrote :

I just use the CLI to make WaitConditionHandle to send the signal again. Then the status for WaitCondition is changed to "CREATE_COMPLETE" successfully.

But Rabi, do you think may we need some improvement for WaitConditionHandle signal sending?

Revision history for this message
Rabi Mishra (rabi) wrote :

> But Rabi, do you think may we need some improvement for WaitConditionHandle signal sending?

What improvement are you expecting?

server user_data has the signal command below which is executed by cloud-init. If that's not happening you should check it in the cloud-init logs.

                ' | sudo tee -a /etc/hosts\n\nwhile true; do\n wc_notify --insecure\
                \ --data-binary '{\"status\": \"SUCCESS\"}'\n if [ $? -eq 0 ]; then\n\
                \ break\n fi\n sleep 10\ndone\n"

Revision history for this message
Shi Yan (yanshi-403) wrote :

Yes, Rabi. It turned out our neutron had something wrong to attach the port thus the resource did not receive the signal.

Thanks, and I will put this bug report as invalid.

Changed in heat:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.