2024-04-04 13:15:20 |
Robert Franzke |
description |
Description
===========
The `cinder-manage db prune` command can fail because of a integrity Error:
```
DBError detected when purging from volumes: (pymysql.err.IntegrityError) (1451, 'Cannot delete or update a parent row: a foreign key constraint fails (`cinder`.`volume_attachment`, CONSTRAINT `volume_attachment_ibfk_1` FOREIGN KEY (`volume_id`) REFERENCES `volumes` (`id`))')\n[SQL: DELETE FROM volumes WHERE volumes.deleted IS true AND volumes.deleted_at < %(deleted_at_1)s]\n[parameters: {'deleted_at_1': datetime.datetime(2024, 3, 5, 7, 8, 56, 775630)}]\n(Background on this error at: https://sqlalche.me/e/14/gkpj)."
```
This can be traced back to this line in the code: https://opendev.org/openstack/cinder/src/branch/master/cinder/db/sqlalchemy/api.py#L8070
As the deleted_age is defined inside the loop, it will be redefined every iteration with the current timestamp.
This can lead to a race condition if dependent resources are deleted at a time close to each other.
Steps to reproduce
==================
Currently none, but a example scenario that should explain the problem.
Example Scenario
================
As the timestamp is created every iteration for each table cleanup.
Imagine a given situation where we have this entries in the volume_attachment table:
| id | volume | deleted | deleted_at |
| --- | ------ | ------- | ------------------- |
| 1 | 1 | 1 | 2024-03-05 09:00:00 |
| 2 | 2 | 1 | 2024-03-05 09:01:00 |
And this entries in the volumes table:
| id | deleted | deleted_at |
| --- | ------- | ------------------- |
| 1 | 1 | 2024-03-05 09:00:00 |
| 2 | 1 | 2024-03-05 09:01:00 |
When the `db prune 1` command is triggered at 2024-03-06 09:00:30, it will first cleanup the volume_attachement with the id 1, as it is more than one day old.
The volume_attachement with the id 2 won't be touched, as it less than 1 day old.
Assume that the cleanup of the volume_attachments takes one minute, the cleanup of the volumes table will be triggered with the 2024-03-06 09:01:30.
So it will try to delete both volumes, as both are more than one day old.
But as the volume_attachment with the id 2 references the volume with the id 2, the deletion of the volume 2 is not possible due to the foreign key constraint.
This could be easily circumvented by defining the deleted_age before entering the loop over the tables.
Expected result
===============
* No IntegrityError even if some depending entries are close to one another in time
Actual result
=============
* integrity Error due to resources that are not cleaned up, because of the race condition
Further comments
================
- I will provide a fix on master by just moving the definition of deleted_age to before entering the loop over the tables. |
Description
===========
The `cinder-manage db prune` command can fail because of a integrity Error:
```
DBError detected when purging from volumes: (pymysql.err.IntegrityError) (1451, 'Cannot delete or update a parent row: a foreign key constraint fails (`cinder`.`volume_attachment`, CONSTRAINT `volume_attachment_ibfk_1` FOREIGN KEY (`volume_id`) REFERENCES `volumes` (`id`))')\n[SQL: DELETE FROM volumes WHERE volumes.deleted IS true AND volumes.deleted_at < %(deleted_at_1)s]\n[parameters: {'deleted_at_1': datetime.datetime(2024, 3, 5, 7, 8, 56, 775630)}]\n(Background on this error at: https://sqlalche.me/e/14/gkpj)."
```
This can be traced back to this line in the code: https://opendev.org/openstack/cinder/src/branch/master/cinder/db/sqlalchemy/api.py#L8070
As the deleted_age is defined inside the loop, it will be redefined every iteration with the current timestamp.
This can lead to a race condition if dependent resources are deleted at a time close to each other.
Steps to reproduce
==================
Currently none, but a example scenario that should explain the problem.
Example Scenario
================
Imagine a given situation where we have this entries in the volume_attachment table:
| id | volume | deleted | deleted_at |
| --- | ------ | ------- | ------------------- |
| 1 | 1 | 1 | 2024-03-05 09:00:00 |
| 2 | 2 | 1 | 2024-03-05 09:01:00 |
And this entries in the volumes table:
| id | deleted | deleted_at |
| --- | ------- | ------------------- |
| 1 | 1 | 2024-03-05 09:00:00 |
| 2 | 1 | 2024-03-05 09:01:00 |
When the `db prune 1` command is triggered at 2024-03-06 09:00:30, it will first cleanup the volume_attachement with the id 1, as it is more than one day old.
The volume_attachement with the id 2 won't be touched, as it less than 1 day old.
Assume that the cleanup of the volume_attachments takes one minute, the cleanup of the volumes table will be triggered with the 2024-03-06 09:01:30.
So it will try to delete both volumes, as both are more than one day old.
But as the volume_attachment with the id 2 references the volume with the id 2, the deletion of the volume 2 is not possible due to the foreign key constraint.
This could be easily circumvented by defining the deleted_age before entering the loop over the tables.
Expected result
===============
* No IntegrityError even if some depending entries are close to one another in time
Actual result
=============
* integrity Error due to resources that are not cleaned up, because of the race condition
Further comments
================
- I will provide a fix on master by just moving the definition of deleted_age to before entering the loop over the tables. |
|