Undeletable stacks due to GreenletExit() error

Bug #1640876 reported by Steven Hardy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Invalid
Medium
Unassigned

Bug Description

I've now hit this twice running tripleo - the undercloud heat does not use convergence, and it seems we can hit this when deleting an IN_PROGRESS stack, specifically I'm interrupting a stuck update and get this:

2016-11-10 15:14:55Z [overcloud.ControllerAllNodesValidationDeployment]: UPDATE_COMPLETE state changed
2016-11-10 16:11:29Z [overcloud.AllNodesUpgradeSteps]: CREATE_FAILED CREATE aborted
2016-11-10 16:11:29Z [overcloud]: UPDATE_FAILED Operation cancelled
2016-11-10 16:11:30Z [overcloud]: DELETE_IN_PROGRESS Stack DELETE started
2016-11-10 17:15:18Z [overcloud]: DELETE_FAILED GreenletExit()
2016-11-10 17:15:19Z [overcloud]: DELETE_IN_PROGRESS Stack DELETE started
^C
[stack@instack ~]$

From here I'm stuck, the stack can't be deleted, and I see this in the logs:

2016-11-10 17:15:18.903 17742 DEBUG heat.engine.scheduler [req-f9b6c9d1-030d-48ca-b85c-f664d990dbaa - - - - -] Task destroy cancelled cancel /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:279
2016-11-10 17:15:18.903 17742 DEBUG heat.engine.scheduler [req-f9b6c9d1-030d-48ca-b85c-f664d990dbaa - - - - -] Task destroy cancelled cancel /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:279
2016-11-10 17:15:18.925 17742 DEBUG heat.engine.scheduler [req-f9b6c9d1-030d-48ca-b85c-f664d990dbaa - - - - -] Task destroy cancelled cancel /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:279
2016-11-10 17:15:18.925 17742 DEBUG heat.engine.scheduler [req-f9b6c9d1-030d-48ca-b85c-f664d990dbaa - - - - -] Task destroy cancelled cancel /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:279
2016-11-10 17:15:18.925 17742 DEBUG heat.engine.scheduler [req-f9b6c9d1-030d-48ca-b85c-f664d990dbaa - - - - -] Task destroy cancelled cancel /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:279
2016-11-10 17:15:18.925 17742 INFO heat.engine.stack [req-f9b6c9d1-030d-48ca-b85c-f664d990dbaa - - - - -] Stopped due to GreenletExit() in delete
2016-11-10 17:15:18.931 17742 DEBUG oslo_messaging._drivers.amqpdriver [req-f9b6c9d1-030d-48ca-b85c-f664d990dbaa - - - - -] CAST unique_id: dc60e66db2584019a5f0c0ad3334008b NOTIFY exchange 'heat' topic 'notifications.error' _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:435
2016-11-10 17:15:18.939 17742 INFO heat.engine.stack [req-f9b6c9d1-030d-48ca-b85c-f664d990dbaa - - - - -] Stack DELETE FAILED (overcloud*): GreenletExit()
2016-11-10 17:15:18.954 17742 INFO heat.engine.stack [req-f9b6c9d1-030d-48ca-b85c-f664d990dbaa - - - - -] Stopped due to GreenletExit() in delete
2016-11-10 17:15:18.960 17742 DEBUG oslo_messaging._drivers.amqpdriver [req-f9b6c9d1-030d-48ca-b85c-f664d990dbaa - - - - -] CAST unique_id: d67c23bc8d284457b1f8bdac2cc9c7e4 NOTIFY exchange 'heat' topic 'notifications.error' _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:435
2016-11-10 17:15:18.968 17742 INFO heat.engine.stack [req-f9b6c9d1-030d-48ca-b85c-f664d990dbaa - - - - -] Stack DELETE FAILED (overcloud): GreenletExit()
2016-11-10 17:15:18.992 17742 DEBUG heat.engine.stack_lock [req-f9b6c9d1-030d-48ca-b85c-f664d990dbaa - - - - -] Engine 89ab2095-6e5b-42b3-89b5-749d570d67f4 released lock on stack 05bf9838-9a94-4094-a745-e5585bcc79e6 release /usr/lib/python2.7/site-packages/heat/engine/stack_lock.py:125

Revision history for this message
Steven Hardy (shardy) wrote :

Ok so steps to reproduce (on TripleO, I don't have a simple heat reproducer yet)

1. openstack overcloud deploy --templates

2. Apply https://review.openstack.org/#/c/393448/ to your tripleo-heat-templates, but don't actually install the heat-config ansible hook in the image (this is the default so unless you're running a custom image it will already be missing this) - I'll assume this tree is in /tmp/tripleo-heat-templates below

3. openstack overcloud deploy --templates /tmp/tripleo-heat-templates/ -e /tmp/tripleo-heat-templates/environments/major-upgrade-steps.yaml

This will start an update, which gets stuck because the hook is missing (this may be a bug in itself because I thought heat-config ignored deployments when hooks were missing?)

4. heat stack-delete overcloud

This will fail, with the GreenletExit error (or at least it does for me), same error each time you try to delete.

I'm not sure if my t-h-t patch is related in triggering this issue, but both times I hit this I've followed the same steps.

Thomas Herve (therve)
Changed in heat:
importance: Undecided → Medium
milestone: none → ocata-2
Revision history for this message
Steven Hardy (shardy) wrote :

If anyone can reproduce and confirm this I actually think it's a critical bug

Revision history for this message
Steven Hardy (shardy) wrote :

So after the stack delete fails you see this:

2016-11-11 11:12:33.408 17742 ERROR sqlalchemy.pool.QueuePool [req-6987fc6d-2327-4d68-8aab-cea160d6d212 d3434e2cef184c0eb758573569f308af 5e6152a640374da8bdd78043087e2ce8 - - -] Exception during reset or similar
2016-11-11 11:12:33.408 17742 ERROR sqlalchemy.pool.QueuePool Traceback (most recent call last):
2016-11-11 11:12:33.408 17742 ERROR sqlalchemy.pool.QueuePool File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 636, in _finalize_fairy
2016-11-11 11:12:33.408 17742 ERROR sqlalchemy.pool.QueuePool fairy._reset(pool)
2016-11-11 11:12:33.408 17742 ERROR sqlalchemy.pool.QueuePool File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 776, in _reset
2016-11-11 11:12:33.408 17742 ERROR sqlalchemy.pool.QueuePool pool._dialect.do_rollback(self)
2016-11-11 11:12:33.408 17742 ERROR sqlalchemy.pool.QueuePool File "/usr/lib64/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 2526, in do_rollback
2016-11-11 11:12:33.408 17742 ERROR sqlalchemy.pool.QueuePool dbapi_connection.rollback()
2016-11-11 11:12:33.408 17742 ERROR sqlalchemy.pool.QueuePool File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 724, in rollback
2016-11-11 11:12:33.408 17742 ERROR sqlalchemy.pool.QueuePool self._read_ok_packet()
2016-11-11 11:12:33.408 17742 ERROR sqlalchemy.pool.QueuePool File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 700, in _read_ok_packet
2016-11-11 11:12:33.408 17742 ERROR sqlalchemy.pool.QueuePool raise err.OperationalError(2014, "Command Out of Sync")
2016-11-11 11:12:33.408 17742 ERROR sqlalchemy.pool.QueuePool OperationalError: (2014, 'Command Out of Sync')
2016-11-11 11:12:33.408 17742 ERROR sqlalchemy.pool.QueuePool

Revision history for this message
Steven Hardy (shardy) wrote :

Error looks similar to that discussed in https://bugs.launchpad.net/heat/+bug/1499669

Revision history for this message
Steven Hardy (shardy) wrote :

Ok, some more data, the deletes are failing due to getting stuck forever trying to delete the backup stack here:

https://github.com/openstack/heat/blob/master/heat/engine/stack.py#L1728

Commenting out this means the actual stack itself then deletes OK.

Yet to figure out exactly why that is happening, but I assume it's things left in a bad state after the initial error.

Revision history for this message
Crag Wolfe (cwolfe) wrote :
Rabi Mishra (rabi)
Changed in heat:
milestone: ocata-2 → ocata-3
Rabi Mishra (rabi)
Changed in heat:
milestone: ocata-3 → pike-1
Rico Lin (rico-lin)
Changed in heat:
milestone: pike-1 → pike-2
Rico Lin (rico-lin)
Changed in heat:
milestone: pike-2 → pike-3
Rico Lin (rico-lin)
Changed in heat:
milestone: pike-3 → pike-rc1
Rabi Mishra (rabi)
Changed in heat:
milestone: pike-rc1 → next
Revision history for this message
Zane Bitter (zaneb) wrote :
Zane Bitter (zaneb)
Changed in heat:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.