Bay status is still CREATE_IN_PROGRESS after stack create failed

Bug #1444368 reported by FenghuaFang
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Magnum
Fix Released
High
Eli Qiao

Bug Description

When a bay is failed create, i found the bay status is still CREATE_IN_PROGRESS .

[stack@localhost magnum(keystone_admin)]$ heat stack-list
+--------------------------------------+----------------------+---------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+----------------------+---------------+----------------------+
| 6ad0b1f4-ccc9-4155-b63d-9148f6e7ba7e | testbay-qu76axva3bls | CREATE_FAILED | 2015-04-15T08:07:59Z |
+--------------------------------------+----------------------+---------------+----------------------+
[stack@localhost magnum(keystone_admin)]$ magnum bay-list
+--------------------------------------+---------+------------+--------------------+
| uuid | name | node_count | status |
+--------------------------------------+---------+------------+--------------------+
| 363a48d6-4e44-46c8-ac9e-2b24bc6abcf7 | testbay | 2 | CREATE_IN_PROGRESS |
+--------------------------------------+---------+------------+--------------------+

This is the error messages on the magnum-conductor screen.

"
2015-04-15 04:41:21.873 ERROR magnum.openstack.common.loopingcall [-] in fixed duration looping call
2015-04-15 04:41:21.873 TRACE magnum.openstack.common.loopingcall Traceback (most recent call last):
2015-04-15 04:41:21.873 TRACE magnum.openstack.common.loopingcall File "/opt/stack/magnum/magnum/openstack/common/loopingcall.py", line 81, in _inner
2015-04-15 04:41:21.873 TRACE magnum.openstack.common.loopingcall self.f(*self.args, **self.kw)
2015-04-15 04:41:21.873 TRACE magnum.openstack.common.loopingcall File "/opt/stack/magnum/magnum/conductor/handlers/bay_k8s_heat.py", line 220, in poll_and_check
2015-04-15 04:41:21.873 TRACE magnum.openstack.common.loopingcall 'status': stack.stack_status})
2015-04-15 04:41:21.873 TRACE magnum.openstack.common.loopingcall KeyError: u'attempts'
2015-04-15 04:41:21.873 TRACE magnum.openstack.common.loopingcall
"

So, there is two advices
1.
Whether it's better to update the status of the by to "CREATE_FAILED"?
2.
The bay create progress stuck for a very long time in the status "CREATE_IN_PROGRESS".
Whether it's better to tell the user what magnum is doing now?

affects: kolla → magnum
description: updated
Changed in magnum:
assignee: nobody → Vilobh Meshram (vilobhmm)
Revision history for this message
Janek Lehr (jjlehr) wrote :

I've seen this as well, but if the --timeout option is used during bay create then the bay shows CREATE_FAILED as it should. It looks like when max_attempts is exceeded the bay status is not updated.

Steven Dake (sdake)
Changed in magnum:
status: New → Confirmed
assignee: Vilobh Meshram (vilobhmm) → nobody
importance: Undecided → High
Revision history for this message
Adrian Otto (aotto) wrote :

Please indicate steps to reproduce, and include logs of your attempts to help us make this bug actionable.

Changed in magnum:
status: Confirmed → Incomplete
Revision history for this message
Tom Cammann (tom-cammann) wrote :

It would good to provide the heat logs for this failure also. However Janek's suggestion seems very likely and we need to decide what to do when max_attempts is exceeded or heat falls over. If heat has died in the background or such, there isn't going accurate we can report.

Changed in magnum:
assignee: nobody → Martin Falatic (martinfalatic)
Revision history for this message
Martin Falatic (martinfalatic) wrote :

I am not able to reproduce this issue thus far. If you could provide details of what led to the issue, whether you found a workaround whether the issue is still affecting you and any logs or configuration information, that'd help me debug this.

Revision history for this message
Eli Qiao (taget-9) wrote :

I think this can be closed after periodic task bp finished https://blueprints.launchpad.net/magnum/+spec/add-periodic-task

Changed in magnum:
assignee: Martin Falatic (martinfalatic) → Eli Qiao (taget-9)
status: Incomplete → Fix Committed
Revision history for this message
Eli Qiao (taget-9) wrote :
Revision history for this message
Martin Falatic (martinfalatic) wrote :

Thank you for the update!

lanceyang (332519087-f)
information type: Public → Public Security
information type: Public Security → Private Security
information type: Private Security → Private
information type: Private → Public
Adrian Otto (aotto)
Changed in magnum:
milestone: none → mitaka-2
status: Fix Committed → Fix Released
Adrian Otto (aotto)
Changed in magnum:
milestone: mitaka-2 → mitaka-1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.