"cinder-manage db purge" fails if live volumes refer to deleted services

Bug #2066318 reported by Paul Goins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
New
Undecided
Unassigned
OpenStack Cinder Charm
Incomplete
Undecided
Unassigned

Bug Description

I'm not sure if this is an issue that typically happens in cinder by itself; it may be related to how the cinder Juju charm manages things. However, I observed this today.

During a run of "sudo cinder-manage db purge 90", I get this error:

2024-05-21 22:57:00.325 479170 INFO cinder.db.sqlalchemy.api [req-d7ef0aa0-d564-4f98-9e9c-3e063c39bcdd - - - - -] Purging deleted rows older than age=90 days from table=services
2024-05-21 22:57:00.328 479170 ERROR cinder.db.sqlalchemy.api [req-d7ef0aa0-d564-4f98-9e9c-3e063c39bcdd - - - - -] DBError detected when purging from services: (pymysql.err.IntegrityError) (1451, 'Cannot delete or update a parent row: a foreign key constraint fails (`cinder`.`volumes`, CONSTRAINT `volumes_ibfk_3` FOREIGN KEY (`service_uuid`) REFERENCES `services` (`uuid`))')
[SQL: DELETE FROM services WHERE services.deleted IS true AND services.deleted_at < %(deleted_at_1)s]
[parameters: {'deleted_at_1': datetime.datetime(2024, 2, 21, 22, 57, 0, 325958)}]
(Background on this error at: https://sqlalche.me/e/14/gkpj).: oslo_db.exception.DBReferenceError: (pymysql.err.IntegrityError) (1451, 'Cannot delete or update a parent row: a foreign key constraint fails (`cinder`.`volumes`, CONSTRAINT `volumes_ibfk_3` FOREIGN KEY (`service_uuid`) REFERENCES `services` (`uuid`))')
Purge command failed, check cinder-manage logs for more details.

Indeed, it seems that the services should *not* be deleted, as I still have live volumes which refer to the deleted services:

ysql> SELECT Count(*) FROM cinder.volumes v LEFT JOIN cinder.services s ON v.service_uuid = s.uuid WHERE v.deleted=0 AND s.deleted IS TRUE;
+----------+
| Count(*) |
+----------+
| 899 |
+----------+
1 row in set (0.00 sec)

We sometimes have a case where we end up with extra cinder services showing up in "openstack volume service list", and this is typically "cleaned up" via running the Juju action "juju run-action --wait cinder/leader remove-services". However, I suspect that this "cleanup" action causes the stale volumes. As a side effect, while the "cinder-manage db purge" seems to at least partially run, I am not left with confidence that the cleanup has run on all the tables that it could.

As I write this, I suspect this is most likely a bug for charm-cinder rather than cinder proper, but I'll file it under both for the sake of review.

Cinder version in use: 2:20.3.1-0ubuntu1.1, from jammy-updates/main.
OS: Ubuntu 22.04 Jammy
Cinder charm in use: yoga/stable channel, revision 656

Revision history for this message
Billy Olsen (billy-olsen) wrote :

@Paul - can you provide some more commentary about the additional cinder services that show up and the circumstances around this?

Changed in charm-cinder:
status: New → Incomplete
Revision history for this message
Paul Goins (vultaire) wrote :
Download full text (3.7 KiB)

@Billy:

We've often hit situations where we get alerts regarding cinder services not being online. I'm not sure of all the causes - an obvious cause would be if a cinder unit were redeployed to a new machine or LXD, but it may happen in other cases as well - perhaps in case of upgrades or service restarts, but I am somewhat speculating here.

Here is one example from one of our internal clouds:

$ cinder service-list
WARNING:cinderclient.shell:API version 3.66 requested,
WARNING:cinderclient.shell:downgrading to 3.60 based on server support.
+------------------+--------------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+
| Binary | Host | Zone | Status | State | Updated_at | Cluster | Disabled Reason | Backend State |
+------------------+--------------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+
| cinder-scheduler | cinder | nova | enabled | up | 2024-02-12T12:33:12.000000 | - | - | |
| cinder-volume | cinder@cinder-ceph-flash | nova | enabled | down | 2024-02-12T11:07:54.000000 | - | - | up |
| cinder-volume | cinder@cinder-ceph-flash | nova | enabled | down | 2024-02-12T11:08:53.000000 | - | - | up |
| cinder-volume | cinder@cinder-ceph-flash | nova | enabled | up | 2024-02-12T12:33:11.000000 | - | - | up |
| cinder-volume | cinder@cinder-ceph-hdd | nova | enabled | down | 2024-02-12T11:07:56.000000 | - | - | up |
| cinder-volume | cinder@cinder-ceph-hdd | nova | enabled | down | 2024-02-12T11:09:01.000000 | - | - | up |
| cinder-volume | cinder@cinder-ceph-hdd | nova | enabled | up | 2024-02-12T12:33:09.000000 | - | - | up |
+------------------+--------------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+

This action was run to clean up the old records: juju run-action cinder/leader remove-services --wait

And the resulting cinder service status was:

$ cinder service-list
WARNING:cinderclient.shell:API version 3.66 requested,
WARNING:cinderclient.shell:downgrading to 3.60 based on server support.
+------------------+--------------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+
| Binary | Host | Zone | Status | State | Updated_at | Cluster | Disabled Reason | Backend State |
+------------------+--------------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+
| cinder-scheduler | cinder | nova | enabled | up | 2024-02-12T12:35:42.000000 | - | - | |
| cinder-volume | cinder@cinder-ceph-flash | nova | enabled | up | 2024-02-12T12:35:35.000000 | - | - | up |
| cinder...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.