database online data migration fail due to missing request spec marker

Bug #1793419 reported by Jack Ding
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Jack Ding
Pike
Fix Committed
Low
Matt Riedemann
Queens
Fix Committed
Low
Matt Riedemann
Rocky
Fix Committed
Low
Lee Yarwood

Bug Description

Description
===========
During upgrade we run nova online migration that goes through the list of instances and creates a request spec record in the db if one does not exist. As the online migrations are batched, the request spec migration leaves a marker record in the request_specs table to indicate the last instance uuid that was processed. It continues processing starting from that instances on the next batch.

In our upgrade test, we hit a scenario where the marker instance from the online migration that was run during the Mitaka->Newton upgrade had been deleted and purged from the db by time we ran the Newton->Pike upgrade. This caused the online migration to fail as the marker instance couldn't be found.

Steps to reproduce
==================
- run data online migration on installed Newton load.
  nova-manage db online_data_migrations
- delete the instance referenced by the marker (instance_uuid 00000000-0000-0000-0000-000000000000)
- purge db:
  nova-manage db purge
- upgrade to Pike.

Expected result
===============
Upgrade successful with no exceptions.

Actual result
=============
Exceptions occur during upgrade with missing marker an upgrade failed.
Error attempting to run <function migrate_instances_add_request_spec at 0x5151050>
14 rows matched query service_uuids_online_data_migration, 14 migrated
13 rows matched query migrate_quota_limits_to_api_db, 13 migrated
Error attempting to run <function migrate_instances_add_request_spec at 0x5151050>
+---------------------------------------------+--------------+-----------+
| Migration | Total Needed | Completed |
+---------------------------------------------+--------------+-----------+
| delete_build_requests_with_no_instance_uuid | 0 | 0 |
| migrate_aggregate_reset_autoincrement | 0 | 0 |
| migrate_aggregates | 0 | 0 |
| migrate_flavor_reset_autoincrement | 0 | 0 |
| migrate_flavors | 0 | 0 |
| migrate_instance_groups_to_api_db | 0 | 0 |
| migrate_instance_keypairs | 0 | 0 |
| migrate_instances_add_request_spec | 0 | 0 |
| migrate_keypairs_to_api_db | 0 | 0 |
| migrate_quota_classes_to_api_db | 0 | 0 |
| migrate_quota_limits_to_api_db | 0 | 0 |
| service_uuids_online_data_migration | 0 | 0 |
+---------------------------------------------+--------------+-----------+

Tags: upgrade
Jack Ding (jackding)
description: updated
Revision history for this message
Matt Riedemann (mriedem) wrote :

> In our upgrade test, we hit a scenario where the marker instance from the online migration that was run during the Mitaka->Newton upgrade had been deleted and purged from the db

Why/how was the instance marker record deleted? Since it's not owned by any real project/user someone shouldn't be able to list/show it in the API (except maybe an admin when listing servers for all tenants?).

tags: added: upgrade
Revision history for this message
Matt Riedemann (mriedem) wrote :

Do you have more details from the failure? If the marker instance was deleted, I'd expect the migration to just start over from the beginning.

Revision history for this message
Matt Riedemann (mriedem) wrote :

OK I think I see, _get_marker_for_migrate_instances returns the marker because there is still a request_specs table entry with the marker instance_uuid (because we didn't used to clean up request specs on db archive/purge - but now we do). So when listing instances we passed a marker to an instance which wasn't found, and that raised MarkerNotFound and failed.

Changed in nova:
importance: Undecided → Low
status: New → Triaged
Revision history for this message
Matt Riedemann (mriedem) wrote :

BTW, this is the change that started deleting request_specs when archiving deleted instances:

https://review.openstack.org/#/q/I483701a55576c245d091ff086b32081b392f746e

Jack Ding (jackding)
Changed in nova:
assignee: nobody → Jack Ding (jackding)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/605164

Changed in nova:
status: Triaged → In Progress
Changed in nova:
assignee: Jack Ding (jackding) → Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Jack Ding (jackding)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/605164
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ff03b157b930de23e5912802cbfbc86889c869c2
Submitter: Zuul
Branch: master

commit ff03b157b930de23e5912802cbfbc86889c869c2
Author: Jack Ding <email address hidden>
Date: Tue Sep 25 13:20:25 2018 -0400

    Handle missing marker during online data migration

    During upgrade the instance used by the request spec marker could be
    deleted and purged between sessions. This would cause the database
    online data migration to fail as the marker instance couldn't be found.

    Fix by handling the MarkerNotFound exception and re-trying without the
    marker. This will go through all the instances and reset the marker when
    done.

    Closes-Bug: #1793419
    Change-Id: If96e3d038346f16cc93209bccf3db028bacfe59b
    Signed-off-by: Jack Ding <email address hidden>

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/608572

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/608572
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=57b46754ff67c52c49e10d3ae176515cdcda30e3
Submitter: Zuul
Branch: stable/rocky

commit 57b46754ff67c52c49e10d3ae176515cdcda30e3
Author: Jack Ding <email address hidden>
Date: Tue Sep 25 13:20:25 2018 -0400

    Handle missing marker during online data migration

    During upgrade the instance used by the request spec marker could be
    deleted and purged between sessions. This would cause the database
    online data migration to fail as the marker instance couldn't be found.

    Fix by handling the MarkerNotFound exception and re-trying without the
    marker. This will go through all the instances and reset the marker when
    done.

    Closes-Bug: #1793419
    Change-Id: If96e3d038346f16cc93209bccf3db028bacfe59b
    Signed-off-by: Jack Ding <email address hidden>
    (cherry picked from commit ff03b157b930de23e5912802cbfbc86889c869c2)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/610974

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/611343

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.3

This issue was fixed in the openstack/nova 18.0.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/610974
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=164b1ba301f580edbfa5a52a80b51f49f2a28fea
Submitter: Zuul
Branch: stable/queens

commit 164b1ba301f580edbfa5a52a80b51f49f2a28fea
Author: Jack Ding <email address hidden>
Date: Tue Sep 25 13:20:25 2018 -0400

    Handle missing marker during online data migration

    During upgrade the instance used by the request spec marker could be
    deleted and purged between sessions. This would cause the database
    online data migration to fail as the marker instance couldn't be found.

    Fix by handling the MarkerNotFound exception and re-trying without the
    marker. This will go through all the instances and reset the marker when
    done.

    Closes-Bug: #1793419
    Change-Id: If96e3d038346f16cc93209bccf3db028bacfe59b
    Signed-off-by: Jack Ding <email address hidden>
    (cherry picked from commit ff03b157b930de23e5912802cbfbc86889c869c2)
    (cherry picked from commit 57b46754ff67c52c49e10d3ae176515cdcda30e3)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.9

This issue was fixed in the openstack/nova 17.0.9 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/611343
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=456e5439f58967fc03c7c13e11fe0e15bea84486
Submitter: Zuul
Branch: stable/pike

commit 456e5439f58967fc03c7c13e11fe0e15bea84486
Author: Jack Ding <email address hidden>
Date: Tue Sep 25 13:20:25 2018 -0400

    Handle missing marker during online data migration

    During upgrade the instance used by the request spec marker could be
    deleted and purged between sessions. This would cause the database
    online data migration to fail as the marker instance couldn't be found.

    Fix by handling the MarkerNotFound exception and re-trying without the
    marker. This will go through all the instances and reset the marker when
    done.

    Closes-Bug: #1793419
    Change-Id: If96e3d038346f16cc93209bccf3db028bacfe59b
    Signed-off-by: Jack Ding <email address hidden>
    (cherry picked from commit ff03b157b930de23e5912802cbfbc86889c869c2)
    (cherry picked from commit 57b46754ff67c52c49e10d3ae176515cdcda30e3)
    (cherry picked from commit 164b1ba301f580edbfa5a52a80b51f49f2a28fea)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.8

This issue was fixed in the openstack/nova 16.1.8 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.