cinder-manage db prune race condition
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Cinder |
New
|
Undecided
|
Unassigned |
Bug Description
Description
===========
The `cinder-manage db prune` command can fail because of a integrity Error:
```
DBError detected when purging from volumes: (pymysql.
```
This can be traced back to this line in the code: https:/
As the deleted_age is defined inside the loop, it will be redefined every iteration with the current timestamp.
This can lead to a race condition if dependent resources are deleted at a time close to each other.
Steps to reproduce
==================
Currently none, but a example scenario that should explain the problem.
Example Scenario
================
Imagine a given situation where we have this entries in the volume_attachment table:
| id | volume | deleted | deleted_at |
| --- | ------ | ------- | ------------------- |
| 1 | 1 | 1 | 2024-03-05 09:00:00 |
| 2 | 2 | 1 | 2024-03-05 09:01:00 |
And this entries in the volumes table:
| id | deleted | deleted_at |
| --- | ------- | ------------------- |
| 1 | 1 | 2024-03-05 09:00:00 |
| 2 | 1 | 2024-03-05 09:01:00 |
When the `db prune 1` command is triggered at 2024-03-06 09:00:30, it will first cleanup the volume_attachement with the id 1, as it is more than one day old.
The volume_attachement with the id 2 won't be touched, as it less than 1 day old.
Assume that the cleanup of the volume_attachments takes one minute, the cleanup of the volumes table will be triggered with the 2024-03-06 09:01:30.
So it will try to delete both volumes, as both are more than one day old.
But as the volume_attachment with the id 2 references the volume with the id 2, the deletion of the volume 2 is not possible due to the foreign key constraint.
This could be easily circumvented by defining the deleted_age before entering the loop over the tables.
Expected result
===============
* No IntegrityError even if some depending entries are close to one another in time
Actual result
=============
* integrity Error due to resources that are not cleaned up, because of the race condition
Further comments
================
- I will provide a fix on master by just moving the definition of deleted_age to before entering the loop over the tables.
description: | updated |
Fix proposed to branch: master /review. opendev. org/c/openstack /cinder/ +/915064
Review: https:/