Online data migrations fail to execute correctly when upgrading from 2023.1
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Triaged
|
Medium
|
Unassigned |
Bug Description
Description
===========
When executing online_
The issue is that by default a limit (max-count) of 50 records is used by nova-manage in the migration, but as numerous records are unsuitable for migration (where there is no existing node ID), eventually the first 50 records returned by the database query are irrelevant, and the migration exits as if it has completed, even though many relevant records remain to be migrated.
A secondary issue is that this query approach causes the migration method to be executed many hundreds or thousands of times more than necessary as on every iteration it has to ignore the same irrelevant records. This takes a long time. In the deployment I've just upgraded the migration took upwards of 15 minutes before exiting.
My suspicion is that the query in https:/
Steps to reproduce
==================
Using a moderately sized database (a few tens of thousands of records).
* Perform an upgrade from 2023.1 to 2023.2
* Execute 'nova-manage db online_
Expected result
===============
Migrations complete in a reasonable time, with all relevant records migrated.
Actual result
=============
nova-manage exits after a long period of time with an apparent success, but in reality many records remain un-migrated.
In the two deployments we have migrated to date, we are left with the following apparently relevant records which should have been migrated but haven't been:
MariaDB [nova]> select count(*) from instances where compute_id is null and node is not null and host is not null;
+----------+
| count(*) |
+----------+
| 29147 |
+----------+
1 row in set (0.045 sec)
MariaDB [nova]> select count(*) from instances where compute_id is null and node is not null and host is not null;
+----------+
| count(*) |
+----------+
| 22622 |
+----------+
1 row in set (0.048 sec)
Environment
===========
Nova 45a926156c863b4
Libvirt+KVM
Ceph
Neutron+LXB
Logs & Configs
==============
During nova-manage execution, log messages such as the following are printed:
50 rows matched query populate_
50 rows matched query populate_
50 rows matched query populate_
50 rows matched query populate_
...
50 rows matched query populate_
50 rows matched query populate_
50 rows matched query populate_
50 rows matched query populate_
50 rows matched query populate_
50 rows matched query populate_
50 rows matched query populate_
50 rows matched query populate_
+------
| Migration | Total Needed | Completed |
+------
| fill_virtual_
| migrate_empty_ratio | 0 | 0 |
| migrate_
| migrate_
| migration_
| populate_dev_uuids | 0 | 0 |
| populate_
| populate_
| populate_
| populate_user_id | 0 | 0 |
| populate_uuids | 0 | 0 |
+------
Note the repeating number of migrations, indicating that the first 44, then 49 records are irrelevant for migration (triggering https:/
A further run of the migration after the above completion shows:
50 rows matched query populate_
+------
| Migration | Total Needed | Completed |
+------
| fill_virtual_
| migrate_empty_ratio | 0 | 0 |
| migrate_
| migrate_
| migration_
| populate_dev_uuids | 0 | 0 |
| populate_
| populate_
| populate_
| populate_user_id | 0 | 0 |
| populate_uuids | 0 | 0 |
+------
If you then increase the --max-count parameter, further migrations will proceed:
100 rows matched query populate_
+------
| Migration | Total Needed | Completed |
+------
| fill_virtual_
| migrate_empty_ratio | 0 | 0 |
| migrate_
| migrate_
| migration_
| populate_dev_uuids | 0 | 0 |
| populate_
| populate_
| populate_
| populate_user_id | 0 | 0 |
| populate_uuids | 0 | 0 |
+------
In our deployment databases we have the following records which would likely trigger this issue:
MariaDB [nova]> select count(*) from instances where node is null;
+----------+
| count(*) |
+----------+
| 251 |
+----------+
1 row in set (0.029 sec)
MariaDB [nova]> select count(*) from instances where node is null;
+----------+
| count(*) |
+----------+
| 141 |
+----------+
1 row in set (0.039 sec)
description: | updated |
description: | updated |
Changed in nova: | |
status: | New → Triaged |
importance: | Undecided → Medium |