Activity log for bug #2060201

Date Who What changed Old value New value Message
2024-04-04 13:14:29 Robert Franzke bug added bug
2024-04-04 13:15:20 Robert Franzke description Description =========== The `cinder-manage db prune` command can fail because of a integrity Error: ``` DBError detected when purging from volumes: (pymysql.err.IntegrityError) (1451, 'Cannot delete or update a parent row: a foreign key constraint fails (`cinder`.`volume_attachment`, CONSTRAINT `volume_attachment_ibfk_1` FOREIGN KEY (`volume_id`) REFERENCES `volumes` (`id`))')\n[SQL: DELETE FROM volumes WHERE volumes.deleted IS true AND volumes.deleted_at < %(deleted_at_1)s]\n[parameters: {'deleted_at_1': datetime.datetime(2024, 3, 5, 7, 8, 56, 775630)}]\n(Background on this error at: https://sqlalche.me/e/14/gkpj)." ``` This can be traced back to this line in the code: https://opendev.org/openstack/cinder/src/branch/master/cinder/db/sqlalchemy/api.py#L8070 As the deleted_age is defined inside the loop, it will be redefined every iteration with the current timestamp. This can lead to a race condition if dependent resources are deleted at a time close to each other. Steps to reproduce ================== Currently none, but a example scenario that should explain the problem. Example Scenario ================ As the timestamp is created every iteration for each table cleanup. Imagine a given situation where we have this entries in the volume_attachment table: | id | volume | deleted | deleted_at | | --- | ------ | ------- | ------------------- | | 1 | 1 | 1 | 2024-03-05 09:00:00 | | 2 | 2 | 1 | 2024-03-05 09:01:00 | And this entries in the volumes table: | id | deleted | deleted_at | | --- | ------- | ------------------- | | 1 | 1 | 2024-03-05 09:00:00 | | 2 | 1 | 2024-03-05 09:01:00 | When the `db prune 1` command is triggered at 2024-03-06 09:00:30, it will first cleanup the volume_attachement with the id 1, as it is more than one day old. The volume_attachement with the id 2 won't be touched, as it less than 1 day old. Assume that the cleanup of the volume_attachments takes one minute, the cleanup of the volumes table will be triggered with the 2024-03-06 09:01:30. So it will try to delete both volumes, as both are more than one day old. But as the volume_attachment with the id 2 references the volume with the id 2, the deletion of the volume 2 is not possible due to the foreign key constraint. This could be easily circumvented by defining the deleted_age before entering the loop over the tables. Expected result =============== * No IntegrityError even if some depending entries are close to one another in time Actual result ============= * integrity Error due to resources that are not cleaned up, because of the race condition Further comments ================ - I will provide a fix on master by just moving the definition of deleted_age to before entering the loop over the tables. Description =========== The `cinder-manage db prune` command can fail because of a integrity Error: ``` DBError detected when purging from volumes: (pymysql.err.IntegrityError) (1451, 'Cannot delete or update a parent row: a foreign key constraint fails (`cinder`.`volume_attachment`, CONSTRAINT `volume_attachment_ibfk_1` FOREIGN KEY (`volume_id`) REFERENCES `volumes` (`id`))')\n[SQL: DELETE FROM volumes WHERE volumes.deleted IS true AND volumes.deleted_at < %(deleted_at_1)s]\n[parameters: {'deleted_at_1': datetime.datetime(2024, 3, 5, 7, 8, 56, 775630)}]\n(Background on this error at: https://sqlalche.me/e/14/gkpj)." ``` This can be traced back to this line in the code: https://opendev.org/openstack/cinder/src/branch/master/cinder/db/sqlalchemy/api.py#L8070 As the deleted_age is defined inside the loop, it will be redefined every iteration with the current timestamp. This can lead to a race condition if dependent resources are deleted at a time close to each other. Steps to reproduce ================== Currently none, but a example scenario that should explain the problem. Example Scenario ================ Imagine a given situation where we have this entries in the volume_attachment table: | id | volume | deleted | deleted_at | | --- | ------ | ------- | ------------------- | | 1 | 1 | 1 | 2024-03-05 09:00:00 | | 2 | 2 | 1 | 2024-03-05 09:01:00 | And this entries in the volumes table: | id | deleted | deleted_at | | --- | ------- | ------------------- | | 1 | 1 | 2024-03-05 09:00:00 | | 2 | 1 | 2024-03-05 09:01:00 | When the `db prune 1` command is triggered at 2024-03-06 09:00:30, it will first cleanup the volume_attachement with the id 1, as it is more than one day old. The volume_attachement with the id 2 won't be touched, as it less than 1 day old. Assume that the cleanup of the volume_attachments takes one minute, the cleanup of the volumes table will be triggered with the 2024-03-06 09:01:30. So it will try to delete both volumes, as both are more than one day old. But as the volume_attachment with the id 2 references the volume with the id 2, the deletion of the volume 2 is not possible due to the foreign key constraint. This could be easily circumvented by defining the deleted_age before entering the loop over the tables. Expected result =============== * No IntegrityError even if some depending entries are close to one another in time Actual result ============= * integrity Error due to resources that are not cleaned up, because of the race condition Further comments ================ - I will provide a fix on master by just moving the definition of deleted_age to before entering the loop over the tables.
2024-04-04 14:19:18 Maxim Korezkij bug added subscriber Maxim Korezkij