DB sync fails during upgrade from 2023.2 to 2024.1

Bug #2070475 reported by Andrew Bonney
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
New
High
Unassigned

Bug Description

When upgrading Cinder from 539d8725258932e8a655370fa3ffc7f2eefa85b9 (2023.2) to 0ff4262fba803152e94e32e0dc8e4a2e56fcb0f5 (2024.1) we are observing a failure during the DB sync step 'Make use_quota non nullable'.

The 'cinder-manage db sync' command exits with code '1' and no output. The logs report the following traceback: https://paste.openstack.org/show/824537/

It appears the command 'ALTER TABLE volumes MODIFY use_quota BOOL NOT NULL DEFAULT true;' does not work against the existing definition '`use_quota` tinyint(1) DEFAULT NULL' given the existing data held in the column which appears to be a mix of '0', '1' and 'NULL'.

The traceback reports an issue with row 1 which has a column value of NULL.

We're running MariaDB version 10.11.6 in this case.

description: updated
Revision history for this message
Rajat Dhasmana (whoami-rajat) wrote :

Hi Andrew,

Ideally we shouldn't have a value of NULL in the use_quota field given we have online data migrations replacing NULL with 0 and 1 values[1].

My question would be, did you run the online data migrations before running the db sync?

[1] https://github.com/openstack/cinder/blob/stable/2023.2/cinder/db/sqlalchemy/api.py#L8690

Revision history for this message
Andrew Bonney (andrewbonney) wrote (last edit ):

Hi,
Online data migrations should have been running after each upgrade as part of Openstack Ansible's role. I've just run this manually with the following result:

(cinder-28.1.0) root@infra2-cinder-api-container-7c0b382d:~# cinder-manage db online_data_migrations
Running batches of 50 until complete.
+------------------------------------------+----------------+-------------+
| Migration | Total Needed | Completed |
|------------------------------------------+----------------+-------------|
| snapshot_use_quota_online_data_migration | 0 | 0 |
| volume_use_quota_online_data_migration | 0 | 0 |
+------------------------------------------+----------------+-------------+

However, looking at the database I see the following:

MariaDB [cinder]> select count(*) from volumes where use_quota is null;
+----------+
| count(*) |
+----------+
| 3311 |
+----------+
1 row in set (0.054 sec)

MariaDB [cinder]> select count(*) from snapshots where use_quota is null;
+----------+
| count(*) |
+----------+
| 1014 |
+----------+
1 row in set (0.006 sec)

I've tried a couple of tests in the code, and despite the DB reporting these numbers in the thousands, if the 'use_quota' filter is removed from the query in this area it only returns 76 volumes and 7 snapshots.

Revision history for this message
Andrew Bonney (andrewbonney) wrote :

A further test suggests it is ignoring deleted volumes/snapshots in the migration.

Revision history for this message
Rajat Dhasmana (whoami-rajat) wrote :

Hi Andrew,

you are right, we don't mention the read_deleted='yes'[1] field in the model query[2] hence we are not updating the use_quota field in the deleted records.
I think the only way to proceed would be to purge the deleted records to continue with the upgrade.
``cinder-manage db purge 0``

[1] https://github.com/openstack/cinder/blob/stable/2023.2/cinder/db/sqlalchemy/api.py#L317-L318
[2] https://github.com/openstack/cinder/blob/stable/2023.2/cinder/db/sqlalchemy/api.py#L8682-L8684

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/2023.2)

Related fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/cinder/+/923635

Eric Harney (eharney)
Changed in cinder:
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.