nova-status upgrade check shows warnings when it shouldn't

Bug #1790721 reported by Matthew Thode on 2018-09-04
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Matt Riedemann
Ocata
High
Matt Riedemann
Pike
High
Matt Riedemann
Queens
High
Matt Riedemann
Rocky
High
Matt Riedemann

Bug Description

# nova-status upgrade check
+-------------------------------------------------------------------+
| Upgrade Check Results |
+-------------------------------------------------------------------+
| Check: Cells v2 |
| Result: Success |
| Details: None |
+-------------------------------------------------------------------+
| Check: Placement API |
| Result: Success |
| Details: None |
+-------------------------------------------------------------------+
| Check: Resource Providers |
| Result: Warning |
| Details: There are no compute resource providers in the Placement |
| service but there are 2 compute nodes in the deployment. |
| This means no compute nodes are reporting into the |
| Placement service and need to be upgraded and/or fixed. |
| See |
| https://docs.openstack.org/nova/latest/user/placement.html |
| for more details. |
+-------------------------------------------------------------------+
| Check: Ironic Flavor Migration |
| Result: Success |
| Details: None |
+-------------------------------------------------------------------+
| Check: API Service Version |
| Result: Success |
| Details: None |
+-------------------------------------------------------------------+
| Check: Request Spec Migration |
| Result: Success |
| Details: None |
+-------------------------------------------------------------------+

nova hypervisor-list
+--------------------------------------+-----------------------------+-------+---------+
| ID | Hypervisor hostname | State | Status |
+--------------------------------------+-----------------------------+-------+---------+
| UUID1 | node03.NOPE | up | enabled |
| UUID2 | node02.NOPE | up | enabled |
+--------------------------------------+-----------------------------+-------+---------+

openstack resource provider list
+--------------------------------------+-----------------------------+------------+
| uuid | name | generation |
+--------------------------------------+-----------------------------+------------+
| UUID1 | node02.NOPE | 76 |
| UUID2 | node03.NOPE | 34 |
+--------------------------------------+-----------------------------+------------+

Matt Riedemann (mriedem) wrote :

I suspect this is a similar regression as bug 1790701 in that the count for resource providers is using the placement DB API context manager from https://review.openstack.org/#/c/541435/ but the placement DB API context manager is not properly configured for the API DB, so it's likely just hitting the cell0 DB where there are no resource providers.

tags: added: nova-status upgrade
Changed in nova:
status: New → Confirmed
importance: Undecided → High
Matt Riedemann (mriedem) wrote :

Here is a devstack patch that runs nova-status upgrade check which should show this same warning and actually make devstack fail:

https://review.openstack.org/599847

Matt Riedemann (mriedem) wrote :

The devstack change didn't fail because when nova-status upgrade check runs it's not seeing any compute nodes in the cell databases:

http://logs.openstack.org/47/599847/1/check/tempest-full-py3/caf3557/controller/logs/devstacklog.txt.gz#_2018-09-04_22_05_45_875

2018-09-04 22:05:45.875 | +-------------------------------------------------------------------+
2018-09-04 22:05:45.875 | | Check: Resource Providers |
2018-09-04 22:05:45.875 | | Result: Success |
2018-09-04 22:05:45.875 | | Details: There are no compute resource providers in the Placement |
2018-09-04 22:05:45.875 | | service nor are there compute nodes in the database. |
2018-09-04 22:05:45.875 | | Remember to configure new compute nodes to report into the |
2018-09-04 22:05:45.876 | | Placement service. See |
2018-09-04 22:05:45.876 | | https://docs.openstack.org/nova/latest/user/placement.html |
2018-09-04 22:05:45.876 | | for more details. |
2018-09-04 22:05:45.876 | +-------------------------------------------------------------------+

Matt Riedemann (mriedem) wrote :

Ah I think I know what's happening, the else block on this for loop is busted:

https://github.com/openstack/nova/blob/5c6b22559d538e63543c38471e60aeb5c7867e8f/nova/cmd/status.py#L297

And that overwrites the num_computes value to 0 because there are no compute nodes in the cell0 databsae.

Fix proposed to branch: master
Review: https://review.openstack.org/599875

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Matt Riedemann (mriedem) on 2018-09-05
tags: added: placement

Reviewed: https://review.openstack.org/599875
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dcd421ae9e6f0391fea06c9d20949267354c3b3c
Submitter: Zuul
Branch: master

commit dcd421ae9e6f0391fea06c9d20949267354c3b3c
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 4 20:57:33 2018 -0400

    Fix nova-status "_check_resource_providers" check

    The way in which this check counted compute nodes was
    broken because of an incorrect for/else condition. If
    the check is run with a nova.conf like we have in
    devstack, where the API database is configured but
    the [database]/connection is pointing at cell0, where
    there are no compute nodes, the check passes saying
    there are no compute nodes even if the are compute
    nodes found in the cell databases (in the for loop).
    This is because the else executes because the for loop
    doesn't break, and then _count_compute_nodes returns 0
    for cell0 and overwrites the num_computes variable.

    This fixes the issue by checking if we have cell mappings
    before running the loop, else we hit the else block as
    was originally intended.

    Change-Id: I1a706d028a9ca894348a19b7b3df1ea673e4ec90
    Partial-Bug: #1790721

Reviewed: https://review.openstack.org/599744
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=63c10d2d536c3ffed68ccfb6bd393f317111d903
Submitter: Zuul
Branch: master

commit 63c10d2d536c3ffed68ccfb6bd393f317111d903
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 4 14:53:25 2018 -0400

    Configure placement DB context manager for nova-manage/status

    The create_incomplete_consumers online data migration was added in
    Rocky via change Id609789ef6b4a4c745550cde80dd49cabe03869a. That
    relies on hitting some tables in the API (or placement) database.
    The DB API code for that migration relies on a placement context
    manager which looks like it was regressed with change
    I2fff528060ec52a4a2e26a6484bdf18359b95f77 (also in Rocky). This
    results in a DB error trying to query the projects table but
    because of a generic try/except in _run_migration, the failure
    was missed in CI testing.

    Similarly, the nova-status upgrade check "_check_resource_providers"
    routine also uses the placement DB API context manager to count the
    number of compute resource providers in the API (or placement) DB,
    which is returning 0 because it's not using the proper DB connection.
    This was not caught in the nova-status CLI tests because they use
    the DatabaseFixture which *does* configure the global placement DB
    API context manager.

    This adds the configuration of the global placement DB API context
    manager so we can properly query the placement-related tables.
    The blanket problematic try/except from _run_migration is left
    as-is in this change but will be addressed in a separate patch.

    Integration testing of this fix is being performed with devstack:

      https://review.openstack.org/599847/

    Change-Id: I9d97b7a904e2b7d15c763e2a067cc5909cc6c9c5
    Closes-Bug: #1790701
    Closes-Bug: #1790721

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/600098
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f588a0b6c337368596bc8732141e864f56f72212
Submitter: Zuul
Branch: stable/rocky

commit f588a0b6c337368596bc8732141e864f56f72212
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 4 20:57:33 2018 -0400

    Fix nova-status "_check_resource_providers" check

    The way in which this check counted compute nodes was
    broken because of an incorrect for/else condition. If
    the check is run with a nova.conf like we have in
    devstack, where the API database is configured but
    the [database]/connection is pointing at cell0, where
    there are no compute nodes, the check passes saying
    there are no compute nodes even if the are compute
    nodes found in the cell databases (in the for loop).
    This is because the else executes because the for loop
    doesn't break, and then _count_compute_nodes returns 0
    for cell0 and overwrites the num_computes variable.

    This fixes the issue by checking if we have cell mappings
    before running the loop, else we hit the else block as
    was originally intended.

    Change-Id: I1a706d028a9ca894348a19b7b3df1ea673e4ec90
    Partial-Bug: #1790721
    (cherry picked from commit dcd421ae9e6f0391fea06c9d20949267354c3b3c)

tags: added: in-stable-rocky

Reviewed: https://review.openstack.org/600464
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7f25c3e6072c2f892de740038e7d076b24fc09f7
Submitter: Zuul
Branch: stable/rocky

commit 7f25c3e6072c2f892de740038e7d076b24fc09f7
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 4 14:53:25 2018 -0400

    Configure placement DB context manager for nova-manage/status

    The create_incomplete_consumers online data migration was added in
    Rocky via change Id609789ef6b4a4c745550cde80dd49cabe03869a. That
    relies on hitting some tables in the API (or placement) database.
    The DB API code for that migration relies on a placement context
    manager which looks like it was regressed with change
    I2fff528060ec52a4a2e26a6484bdf18359b95f77 (also in Rocky). This
    results in a DB error trying to query the projects table but
    because of a generic try/except in _run_migration, the failure
    was missed in CI testing.

    Similarly, the nova-status upgrade check "_check_resource_providers"
    routine also uses the placement DB API context manager to count the
    number of compute resource providers in the API (or placement) DB,
    which is returning 0 because it's not using the proper DB connection.
    This was not caught in the nova-status CLI tests because they use
    the DatabaseFixture which *does* configure the global placement DB
    API context manager.

    This adds the configuration of the global placement DB API context
    manager so we can properly query the placement-related tables.
    The blanket problematic try/except from _run_migration is left
    as-is in this change but will be addressed in a separate patch.

    Integration testing of this fix is being performed with devstack:

      https://review.openstack.org/599847/

    Change-Id: I9d97b7a904e2b7d15c763e2a067cc5909cc6c9c5
    Closes-Bug: #1790701
    Closes-Bug: #1790721
    (cherry picked from commit 63c10d2d536c3ffed68ccfb6bd393f317111d903)

This issue was fixed in the openstack/nova 18.0.1 release.

Reviewed: https://review.openstack.org/600101
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6a806de71b945509c8c5d3de53ff7ba5aa75bf87
Submitter: Zuul
Branch: stable/queens

commit 6a806de71b945509c8c5d3de53ff7ba5aa75bf87
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 4 20:57:33 2018 -0400

    Fix nova-status "_check_resource_providers" check

    The way in which this check counted compute nodes was
    broken because of an incorrect for/else condition. If
    the check is run with a nova.conf like we have in
    devstack, where the API database is configured but
    the [database]/connection is pointing at cell0, where
    there are no compute nodes, the check passes saying
    there are no compute nodes even if the are compute
    nodes found in the cell databases (in the for loop).
    This is because the else executes because the for loop
    doesn't break, and then _count_compute_nodes returns 0
    for cell0 and overwrites the num_computes variable.

    This fixes the issue by checking if we have cell mappings
    before running the loop, else we hit the else block as
    was originally intended.

    Change-Id: I1a706d028a9ca894348a19b7b3df1ea673e4ec90
    Partial-Bug: #1790721
    (cherry picked from commit dcd421ae9e6f0391fea06c9d20949267354c3b3c)
    (cherry picked from commit f588a0b6c337368596bc8732141e864f56f72212)

tags: added: in-stable-queens

Reviewed: https://review.openstack.org/600113
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3f562226112b99d2044dd05a2e15b9752d5c9c1d
Submitter: Zuul
Branch: stable/pike

commit 3f562226112b99d2044dd05a2e15b9752d5c9c1d
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 4 20:57:33 2018 -0400

    Fix nova-status "_check_resource_providers" check

    The way in which this check counted compute nodes was
    broken because of an incorrect for/else condition. If
    the check is run with a nova.conf like we have in
    devstack, where the API database is configured but
    the [database]/connection is pointing at cell0, where
    there are no compute nodes, the check passes saying
    there are no compute nodes even if the are compute
    nodes found in the cell databases (in the for loop).
    This is because the else executes because the for loop
    doesn't break, and then _count_compute_nodes returns 0
    for cell0 and overwrites the num_computes variable.

    This fixes the issue by checking if we have cell mappings
    before running the loop, else we hit the else block as
    was originally intended.

    Change-Id: I1a706d028a9ca894348a19b7b3df1ea673e4ec90
    Partial-Bug: #1790721
    (cherry picked from commit dcd421ae9e6f0391fea06c9d20949267354c3b3c)
    (cherry picked from commit f588a0b6c337368596bc8732141e864f56f72212)
    (cherry picked from commit 6a806de71b945509c8c5d3de53ff7ba5aa75bf87)

tags: added: in-stable-pike

Reviewed: https://review.openstack.org/600119
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ff8cb33283bbce474426f20f21675ecc1449b4d7
Submitter: Zuul
Branch: stable/ocata

commit ff8cb33283bbce474426f20f21675ecc1449b4d7
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 4 20:57:33 2018 -0400

    Fix nova-status "_check_resource_providers" check

    The way in which this check counted compute nodes was
    broken because of an incorrect for/else condition. If
    the check is run with a nova.conf like we have in
    devstack, where the API database is configured but
    the [database]/connection is pointing at cell0, where
    there are no compute nodes, the check passes saying
    there are no compute nodes even if the are compute
    nodes found in the cell databases (in the for loop).
    This is because the else executes because the for loop
    doesn't break, and then _count_compute_nodes returns 0
    for cell0 and overwrites the num_computes variable.

    This fixes the issue by checking if we have cell mappings
    before running the loop, else we hit the else block as
    was originally intended.

    Conflicts:
          nova/cmd/status.py

    NOTE(mriedem): The conflict is due to not having change
    I35206e665f2c81531a2269dd66f8c5c0df834245 in Ocata.

    Change-Id: I1a706d028a9ca894348a19b7b3df1ea673e4ec90
    Partial-Bug: #1790721
    (cherry picked from commit dcd421ae9e6f0391fea06c9d20949267354c3b3c)
    (cherry picked from commit f588a0b6c337368596bc8732141e864f56f72212)
    (cherry picked from commit 6a806de71b945509c8c5d3de53ff7ba5aa75bf87)
    (cherry picked from commit 3f562226112b99d2044dd05a2e15b9752d5c9c1d)

tags: added: in-stable-ocata

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers