Undeletable stack due to DB event integrity error

Bug #1681772 reported by Steven Hardy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Medium
Crag Wolfe

Bug Description

I'm seeing this error:

-AllNodesDeploySteps-v4mqt2qfkhfa): Resource DELETE failed: DBReferenceError: resources.CephStorageGenerateConfigDeployment: (pymysql.err.IntegrityError) (1451, u'Cannot delete or update a parent row: a foreign key constraint fails (`heat`.`event`, CONSTRAINT `ev_rsrc_prop_data_ref` FOREIGN KEY (`rsrc_prop_data_id`) REFERENCES `resource_properties_data` (`id`))') [SQL: u'DELETE FROM resource_properties_data WHERE resource_properties_data.id IN (%(id_1)s, %(id_2)s, %(id_3)s, %(id_4)s, %(id_5)s, %(id_6)s, %(id_7)s, %(id_8)s, %(id_9)s, %(id_10)s, %(id_11)s)'] [parameters: {u'id_2': 1956, u'id_11': 1977, u'id_10': 1913, u'id_3': 1957, u'id_1': 1955, u'id_6': 1970, u'id_7': 1972, u'id_4': 1959, u'id_5': 1969, u'id_8': 1973, u'id_9': 1974}]

I did create a stack then update to a newer version, but the db_sync is up to date AFAICT:

(undercloud) [stack@undercloud ~]$ sudo heat-manage db_version
2017-04-11 10:33:03.066 7587 WARNING oslo_config.cfg [-] Option "db_backend" from group "DEFAULT" is deprecated. Use option "backend" from group "database".
80
(undercloud) [stack@undercloud ~]$ sudo heat-manage db_sync
2017-04-11 10:33:10.496 7595 WARNING oslo_config.cfg [-] Option "db_backend" from group "DEFAULT" is deprecated. Use option "backend" from group "database".
(undercloud) [stack@undercloud ~]$ sudo heat-manage db_version
2017-04-11 10:33:12.960 7607 WARNING oslo_config.cfg [-] Option "db_backend" from group "DEFAULT" is deprecated. Use option "backend" from group "database".
80

I don't see any recent related migrations or changes to the DB model, but I wanted to raise this to see if other folks are seeing similar. I don't yet have a minimal reproducer.

Revision history for this message
Steven Hardy (shardy) wrote :

Hmm, so it seems that sudo heat-manage migrate_properties_data resolves this, so perhaps I missed some recent change - I don't see a pike release note saying that's mandatory?

Revision history for this message
Crag Wolfe (cwolfe) wrote :

It shouldn't be mandatory. It isn't yet clear to me why this error is occurring.

Revision history for this message
Crag Wolfe (cwolfe) wrote :

I suspect not seeing the issue after heat-manage migrate_properties_data is luck. The error reported is related to the new-style properties data which exist in the resource_properties_data table. Since Ocata, all new properties data that are created exist only in the resource_properties_data table, though we can still read older pre-Ocata properties data in the legacy resource.properties_data column. Migrating just moves the old properties data.

There are only two places where we have delete in clauses on resource_properties_data. One is when a stack is purged, which I assume is not the scenario here as this just seems to be a stack delete. The other is when event pruning is triggered. Assuming the latter is the issue here (entirely possible since every DELETE_IN_PROGRESS and DELETE_COMPLETE results in a new event that can trigger event pruning), the only thing I can think of is s/synchronize_session=False/synchronize_session=True/ here:
https://git.openstack.org/cgit/openstack/heat/tree/heat/db/sqlalchemy/api.py?id=157ede194#n921

On a related note, there are two parameters that are going to have an effect on how often this purge takes place and how many events are purged. The behaviour of these parameters changed a bit with : https://review.openstack.org/#/c/400388/
Note the default for event_purge_batch_size changed to 200 from 10 in that commit.

In general, I'd push for higher numbers for max_events_per_stack and event_purge_batch_size to lessen the number of event purge operations, and event row counting we do during resource actions. That said, this error shouldn't be occurring regardless of the two config values.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/459780

Changed in heat:
assignee: nobody → Crag Wolfe (cwolfe)
status: New → In Progress
Thomas Herve (therve)
Changed in heat:
milestone: none → pike-2
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/459780
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=a6f4c6d2c62d1a545c5f6e7c75e3af1b45724f51
Submitter: Jenkins
Branch: master

commit a6f4c6d2c62d1a545c5f6e7c75e3af1b45724f51
Author: Crag Wolfe <email address hidden>
Date: Tue Apr 25 09:18:25 2017 -0700

    Low-level db delete of events should be synchronous

    Previously, synchronized_session=False was used in the call to prune
    events. This was overly aggressive since it was possible (if rare)
    that, during next db operation to delete resource_properties_data
    rows, some of the referenced events could still have existed resulting
    in a db referential integrity error.

    Change-Id: I5c4cf6a162ff853f84d68e7b203ffa1aae684359
    Closes-Bug: #1681772

Changed in heat:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/heat 9.0.0.0b2

This issue was fixed in the openstack/heat 9.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.