"create" instance action not created when instance is buried in cell0

Bug #1852458 reported by Matt Riedemann
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Matt Riedemann
Ocata
Triaged
Low
Unassigned
Pike
Triaged
Low
Unassigned
Queens
Triaged
Low
Unassigned
Rocky
Triaged
Low
Qiu Fossen
Stein
Fix Released
Low
Stephen Finucane
Train
Fix Released
Low
Matt Riedemann

Bug Description

Before cell0 was introduced the API would create the "create" instance action for each instance in the nova cell database before casting off to conductor to do scheduling:

https://github.com/openstack/nova/blob/mitaka-eol/nova/compute/api.py#L1180

Note that conductor failed to "complete" the action with a failure event:

https://github.com/openstack/nova/blob/mitaka-eol/nova/conductor/manager.py#L374

But at least the action was created.

Since then, with cell0, if scheduling fails the instance is buried in the cell0 database but no instance action is created. To illustrate, I disabled the single nova-compute service on my devstack host and created a server which failed with NoValidHost:

$ openstack server show build-fail1 -f value -c fault
{u'message': u'No valid host was found. ', u'code': 500, u'created': u'2019-11-13T15:57:13Z'}

When listing instance actions I expected to see a "create" action but there were none:

$ nova instance-action-list 008a7d52-dd83-4f52-a720-b3cfcc498259
+--------+------------+---------+------------+------------+
| Action | Request_ID | Message | Start_Time | Updated_At |
+--------+------------+---------+------------+------------+
+--------+------------+---------+------------+------------+

This is because the "create" action is only created when the instance is scheduled to a specific cell:

https://github.com/openstack/nova/blob/20.0.0/nova/conductor/manager.py#L1460

Solution:

The ComputeTaskManager._bury_in_cell0 method should also create a "create" action in cell0 like it does for the instance BDMs and tags.

This goes back to Ocata: https://review.opendev.org/#/c/319379/

Revision history for this message
Matt Riedemann (mriedem) wrote :

Note that currently stable/rocky is the oldest non-extended-maintenance branch upstream so it might not be worth fixing this on branches older than Rocky upstream given how latent this bug is.

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/694165

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/694165
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f2608c91175411ec7c2604035adb39306d7e607e
Submitter: Zuul
Branch: master

commit f2608c91175411ec7c2604035adb39306d7e607e
Author: Matt Riedemann <email address hidden>
Date: Wed Nov 13 15:03:27 2019 -0500

    Create instance action when burying in cell0

    Change I8742071b55f018f864f5a382de20075a5b444a79 in Ocata
    moved the creation of the instance record from the API to
    conductor. As a result, the "create" instance action was
    only being created in conductor when the instance is created
    in a non-cell0 database. This is a regression because before
    that change when a server create would fail during scheduling
    you could still list instance actions for the server and see
    the "create" action but that was lost once we started burying
    those instances in cell0.

    This fixes the bug by creating the "create" action in the cell0
    database when burying an instance there. It goes a step further
    and also creates and finishes an event so the overall action
    message shows up as "Error" with the details about where the
    failure occurred in the event traceback.

    A short release note is added since a new action event is
    added here (conductor_schedule_and_build_instances) rather than
    re-use some kind of event that we could generate from the
    compute service, e.g. compute__do_build_and_run_instance.

    Change-Id: I1e9431e739adfbcfc1ca34b87e826a516a4b18e2
    Closes-Bug: #1852458

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/701279

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/701279
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6484d9ff5b03f7b7d8e9ba296f7f32d2e54fcc11
Submitter: Zuul
Branch: stable/train

commit 6484d9ff5b03f7b7d8e9ba296f7f32d2e54fcc11
Author: Matt Riedemann <email address hidden>
Date: Wed Nov 13 15:03:27 2019 -0500

    Create instance action when burying in cell0

    Change I8742071b55f018f864f5a382de20075a5b444a79 in Ocata
    moved the creation of the instance record from the API to
    conductor. As a result, the "create" instance action was
    only being created in conductor when the instance is created
    in a non-cell0 database. This is a regression because before
    that change when a server create would fail during scheduling
    you could still list instance actions for the server and see
    the "create" action but that was lost once we started burying
    those instances in cell0.

    This fixes the bug by creating the "create" action in the cell0
    database when burying an instance there. It goes a step further
    and also creates and finishes an event so the overall action
    message shows up as "Error" with the details about where the
    failure occurred in the event traceback.

    A short release note is added since a new action event is
    added here (conductor_schedule_and_build_instances) rather than
    re-use some kind of event that we could generate from the
    compute service, e.g. compute__do_build_and_run_instance.

    NOTE(mriedem): A couple of helper method calls in the test had
    to be updated since change I8c96b337f32148f8f5899c9b87af331b1fa41424
    is not in Train.

    Change-Id: I1e9431e739adfbcfc1ca34b87e826a516a4b18e2
    Closes-Bug: #1852458
    (cherry picked from commit f2608c91175411ec7c2604035adb39306d7e607e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/729531

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/729531
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7ff71e6b87bd0ea2ed5bc4ac8dd8a2b6fc188ff4
Submitter: Zuul
Branch: stable/stein

commit 7ff71e6b87bd0ea2ed5bc4ac8dd8a2b6fc188ff4
Author: Matt Riedemann <email address hidden>
Date: Wed Nov 13 15:03:27 2019 -0500

    Create instance action when burying in cell0

    Change I8742071b55f018f864f5a382de20075a5b444a79 in Ocata
    moved the creation of the instance record from the API to
    conductor. As a result, the "create" instance action was
    only being created in conductor when the instance is created
    in a non-cell0 database. This is a regression because before
    that change when a server create would fail during scheduling
    you could still list instance actions for the server and see
    the "create" action but that was lost once we started burying
    those instances in cell0.

    This fixes the bug by creating the "create" action in the cell0
    database when burying an instance there. It goes a step further
    and also creates and finishes an event so the overall action
    message shows up as "Error" with the details about where the
    failure occurred in the event traceback.

    A short release note is added since a new action event is
    added here (conductor_schedule_and_build_instances) rather than
    re-use some kind of event that we could generate from the
    compute service, e.g. compute__do_build_and_run_instance.

    Change-Id: I1e9431e739adfbcfc1ca34b87e826a516a4b18e2
    Closes-Bug: #1852458
    (cherry picked from commit f2608c91175411ec7c2604035adb39306d7e607e)
    (cherry picked from commit 6484d9ff5b03f7b7d8e9ba296f7f32d2e54fcc11)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.