online_data_migrations fail in rocky+

Bug #1790701 reported by Matthew Thode
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Critical
Matt Riedemann
Rocky
Fix Committed
Critical
Matt Riedemann

Bug Description

# nova-manage --debug db online_data_migrations
Running batches of 50 until complete
2018-09-04 13:39:12.628 15100 ERROR oslo_db.sqlalchemy.exc_filters [req-73eac47f-96f5-4393-8104-80d64e4f281f - - - - -] DBAPIError exception wrapped from (psycopg2.ProgrammingError) relation "projects" does not exist
LINE 2: FROM projects
             ^
 [SQL: 'SELECT projects.id \nFROM projects \nWHERE projects.external_id = %(external_id_1)s'] [parameters: {'external_id_1': '00000000-0000-0000-0000-000000000000'}] (Background on this error at: http://sqlalche.me/e/f405): psycopg2.ProgrammingError: relation "projects" does not exist
LINE 2: FROM projects
             ^
2018-09-04 13:39:12.628 15100 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last):
2018-09-04 13:39:12.628 15100 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python3.5/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
2018-09-04 13:39:12.628 15100 ERROR oslo_db.sqlalchemy.exc_filters context)
2018-09-04 13:39:12.628 15100 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python3.5/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute
2018-09-04 13:39:12.628 15100 ERROR oslo_db.sqlalchemy.exc_filters cursor.execute(statement, parameters)
2018-09-04 13:39:12.628 15100 ERROR oslo_db.sqlalchemy.exc_filters psycopg2.ProgrammingError: relation "projects" does not exist
2018-09-04 13:39:12.628 15100 ERROR oslo_db.sqlalchemy.exc_filters LINE 2: FROM projects
2018-09-04 13:39:12.628 15100 ERROR oslo_db.sqlalchemy.exc_filters ^
2018-09-04 13:39:12.628 15100 ERROR oslo_db.sqlalchemy.exc_filters
2018-09-04 13:39:12.628 15100 ERROR oslo_db.sqlalchemy.exc_filters
Error attempting to run <function create_incomplete_consumers at 0x7fe62f7d19d8>
+---------------------------------------------+--------------+-----------+
| Migration | Total Needed | Completed |
+---------------------------------------------+--------------+-----------+
| create_incomplete_consumers | 0 | 0 |
| delete_build_requests_with_no_instance_uuid | 0 | 0 |
| migrate_instances_add_request_spec | 0 | 0 |
| migrate_keypairs_to_api_db | 0 | 0 |
| migrate_quota_classes_to_api_db | 0 | 0 |
| migrate_quota_limits_to_api_db | 0 | 0 |
| migration_migrate_to_uuid | 0 | 0 |
| populate_missing_availability_zones | 0 | 0 |
| populate_queued_for_delete | 0 | 0 |
| populate_uuids | 0 | 0 |
| service_uuids_online_data_migration | 0 | 0 |
+---------------------------------------------+--------------+-----------+

Revision history for this message
Matthew Thode (prometheanfire) wrote :

still exits 0 though

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looks like this was the regression:

https://review.openstack.org/#/c/541435/

Because before that the placement_context_manager was configured in the sqlalchemy DB API code. Now it's only configured in a few select places, one of which is not the online_data_migrations code.

This is also noticeable in devstack:

http://logs.openstack.org/08/599208/2/check/tempest-full/608d60a/controller/logs/devstacklog.txt.gz#_2018-09-02_15_04_31_949

But because we try/except Exception in online_data_migrations we ignored the failure.

Changed in nova:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/599744

Revision history for this message
Chris Dent (cdent) wrote :

Note that this also points out that there's a placement db online migration in the nova online migrations (using code from nova/api/openstack/placement/objects ) which we're going to need address at some point soonish.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/599822

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/599822
Reason: Squashed into https://review.openstack.org/#/c/599744/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/600085

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/600464

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/599744
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=63c10d2d536c3ffed68ccfb6bd393f317111d903
Submitter: Zuul
Branch: master

commit 63c10d2d536c3ffed68ccfb6bd393f317111d903
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 4 14:53:25 2018 -0400

    Configure placement DB context manager for nova-manage/status

    The create_incomplete_consumers online data migration was added in
    Rocky via change Id609789ef6b4a4c745550cde80dd49cabe03869a. That
    relies on hitting some tables in the API (or placement) database.
    The DB API code for that migration relies on a placement context
    manager which looks like it was regressed with change
    I2fff528060ec52a4a2e26a6484bdf18359b95f77 (also in Rocky). This
    results in a DB error trying to query the projects table but
    because of a generic try/except in _run_migration, the failure
    was missed in CI testing.

    Similarly, the nova-status upgrade check "_check_resource_providers"
    routine also uses the placement DB API context manager to count the
    number of compute resource providers in the API (or placement) DB,
    which is returning 0 because it's not using the proper DB connection.
    This was not caught in the nova-status CLI tests because they use
    the DatabaseFixture which *does* configure the global placement DB
    API context manager.

    This adds the configuration of the global placement DB API context
    manager so we can properly query the placement-related tables.
    The blanket problematic try/except from _run_migration is left
    as-is in this change but will be addressed in a separate patch.

    Integration testing of this fix is being performed with devstack:

      https://review.openstack.org/599847/

    Change-Id: I9d97b7a904e2b7d15c763e2a067cc5909cc6c9c5
    Closes-Bug: #1790701
    Closes-Bug: #1790721

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/600464
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7f25c3e6072c2f892de740038e7d076b24fc09f7
Submitter: Zuul
Branch: stable/rocky

commit 7f25c3e6072c2f892de740038e7d076b24fc09f7
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 4 14:53:25 2018 -0400

    Configure placement DB context manager for nova-manage/status

    The create_incomplete_consumers online data migration was added in
    Rocky via change Id609789ef6b4a4c745550cde80dd49cabe03869a. That
    relies on hitting some tables in the API (or placement) database.
    The DB API code for that migration relies on a placement context
    manager which looks like it was regressed with change
    I2fff528060ec52a4a2e26a6484bdf18359b95f77 (also in Rocky). This
    results in a DB error trying to query the projects table but
    because of a generic try/except in _run_migration, the failure
    was missed in CI testing.

    Similarly, the nova-status upgrade check "_check_resource_providers"
    routine also uses the placement DB API context manager to count the
    number of compute resource providers in the API (or placement) DB,
    which is returning 0 because it's not using the proper DB connection.
    This was not caught in the nova-status CLI tests because they use
    the DatabaseFixture which *does* configure the global placement DB
    API context manager.

    This adds the configuration of the global placement DB API context
    manager so we can properly query the placement-related tables.
    The blanket problematic try/except from _run_migration is left
    as-is in this change but will be addressed in a separate patch.

    Integration testing of this fix is being performed with devstack:

      https://review.openstack.org/599847/

    Change-Id: I9d97b7a904e2b7d15c763e2a067cc5909cc6c9c5
    Closes-Bug: #1790701
    Closes-Bug: #1790721
    (cherry picked from commit 63c10d2d536c3ffed68ccfb6bd393f317111d903)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.1

This issue was fixed in the openstack/nova 18.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/600085
Reason: https://review.openstack.org/#/c/608091/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.