purge_db failed with non-exists raw_template

Bug #1736706 reported by Rico Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
New
Medium
Rico Lin

Bug Description

Found exception in log http://logs.openstack.org/74/510074/6/check/heat-functional-convg-mysql-lbaasv2-py35/8f7dd27/logs/screen-h-eng.txt.gz?#_Dec_04_18_02_39_562224

During purging db it raise with following error

Dec 04 18:02:39.562224 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: ERROR heat.engine.check_resource [None req-873f53de-b70d-4a02-97bf-35e76fa31135 demo None] Unexpected exception in resource check.: heat.common.exception.NotFound: raw template with id 906 not found
Dec 04 18:02:39.562428 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource Traceback (most recent call last):
Dec 04 18:02:39.562614 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource File "/opt/stack/new/heat/heat/engine/check_resource.py", line 321, in check
Dec 04 18:02:39.562760 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource rsrc, stack)
Dec 04 18:02:39.562903 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource File "/opt/stack/new/heat/heat/engine/check_resource.py", line 269, in _initiate_propagate_resource
Dec 04 18:02:39.563076 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource graph_key.rsrc_id, deps, graph_key.is_update)
Dec 04 18:02:39.563258 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource File "/opt/stack/new/heat/heat/engine/check_resource.py", line 357, in check_stack_complete
Dec 04 18:02:39.563401 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource mark_complete, roots, {sender_key: None})
Dec 04 18:02:39.563542 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource File "/opt/stack/new/heat/heat/engine/sync_point.py", line 146, in sync
Dec 04 18:02:39.563738 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource propagate(entity_id, serialize_input_data(input_data))
Dec 04 18:02:39.564724 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource File "/opt/stack/new/heat/heat/engine/check_resource.py", line 353, in mark_complete
Dec 04 18:02:39.564867 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource stack.mark_complete()
Dec 04 18:02:39.565018 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource File "/opt/stack/new/heat/heat/engine/stack.py", line 2077, in mark_complete
Dec 04 18:02:39.565159 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource self.purge_db()
Dec 04 18:02:39.565300 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource File "/opt/stack/new/heat/heat/engine/stack.py", line 2112, in purge_db
Dec 04 18:02:39.565454 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource raw_template_object.RawTemplate.delete(self.context, prev_tmpl_id)
Dec 04 18:02:39.565594 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource File "/opt/stack/new/heat/heat/objects/raw_template.py", line 117, in delete
Dec 04 18:02:39.565750 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource db_api.raw_template_delete(context, template_id)
Dec 04 18:02:39.565898 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource File "/opt/stack/new/heat/heat/db/sqlalchemy/api.py", line 145, in raw_template_delete
Dec 04 18:02:39.566037 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource raw_template = raw_template_get(context, template_id)
Dec 04 18:02:39.566185 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource File "/opt/stack/new/heat/heat/db/sqlalchemy/api.py", line 121, in raw_template_get
Dec 04 18:02:39.566330 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource template_id)
Dec 04 18:02:39.566471 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource heat.common.exception.NotFound: raw template with id 906 not found
Dec 04 18:02:39.566618 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: 2017-12-04 18:02:39.555 16690 TRACE heat.engine.check_resource
Dec 04 18:02:39.566768 ubuntu-xenial-rax-ord-0001201738 heat-engine[14911]: INFO heat.engine.stack [None req-873f53de-b70d-4a02-97bf-35e76fa31135 demo None] Stack DELETE FAILED (CancelUpdateTest-1699518624): raw template with id 906 not found

Rico Lin (rico-lin)
Changed in heat:
milestone: none → queens-3
Revision history for this message
Thomas Herve (therve) wrote :

Just after this error, there is another error:

oslo_db.exception.DBReferenceError: (pymysql.err.IntegrityError) (1452, 'Cannot add or update a child row: a foreign key constraint fails (`heat`.`stack`, CONSTRAINT `stack_ibfk_3` FOREIGN KEY (`user_creds_id`) REFERENCES `user_creds` (`id`))')

So it looks like user_creds got deleted as well. It may be an issue with purge in the end.

Revision history for this message
Thomas Herve (therve) wrote :
Revision history for this message
Rabi Mishra (rabi) wrote :

I think this is probably due to current issue with the replace workflow in convergence where we don't take any lock when doing prepare_for_replace and probably be fixed by https://review.openstack.org/#/c/516459/.

Below is my understanding of what's happening, I may be missing something though.

1. An update for resource (server) is going on and it's stuck in prepare_for_replace (interface detach which normally takes quite a bit of time and we do number of retries for it).

2. Another update (cancel update with rollback) is issued and stack is rolled back with ROLLBACK_COMPLETE, which is quick as there is nothing to do.
http://logs.openstack.org/74/510074/6/check/heat-functional-convg-mysql-lbaasv2-py35/8f7dd27/logs/screen-h-eng.txt.gz#_Dec_04_18_02_30_019806

3. Test cleanup starts the DELETE of the stack.
http://logs.openstack.org/74/510074/6/check/heat-functional-convg-mysql-lbaasv2-py35/8f7dd27/logs/screen-h-eng.txt.gz#_Dec_04_18_02_32_834366

4. In the meantime, the prepare_for_replace finishes from the first update and finds the new traversal (DELETE) and re-triggers it[1].

http://logs.openstack.org/74/510074/6/check/heat-functional-convg-mysql-lbaasv2-py35/8f7dd27/logs/screen-h-eng.txt.gz#_Dec_04_18_02_39_071914

So it seems we end up with 2 stack DELETES (with the same traversal_id it seems) going on in parallel by two workers.

I think the assumption is, when a new traversal hits a resource and the resource is locked for another traversal, it should raise UpdateInProgress and bail out, then the new traversal is re-triggered.

So the first part is not happening here which would probably be taken care of by the fix mentioned above.

[1] https://git.openstack.org/cgit/openstack/heat/tree/heat/engine/check_resource.py#n125

Rico Lin (rico-lin)
Changed in heat:
milestone: queens-3 → rocky-1
Rico Lin (rico-lin)
Changed in heat:
milestone: rocky-1 → rocky-2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.