forced-down vs service disable is not documented well in the compute API reference

Bug #1691871 reported by Matt Riedemann
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
melanie witt

Bug Description

Forcing a service, like nova-compute, down is being used by people for routine planned maintenance/upgrades of their computes, but it's not really intended for that. Planned maintenance for a nova-compute service should disable the service so it's taken out of scheduling decisions, as discussed in the ops guide here:

https://docs.openstack.org/ops-guide/ops-maintenance-compute.html#planned-maintenance

As described in the spec which added the force-down feature:

https://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/mark-host-down.html

It's really about an external monitoring tool detect that a host is about to fail (maybe hardware faults), and the external service needs to force the service down (bypass the service group API heartbeat checks) and perform an evacuation.

The forced-down flag is checked during the evacuate API flow.

Forcing a host down for routine upgrades can be problematic as forced-down hosts are not part of the minimum service version checks:

https://github.com/openstack/nova/blob/master/nova/objects/service.py#L307
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L490

So if you force a mitaka nova-compute service down, and upgrade the rest of your computes to newton, when you try to set the mitaka service to forced_down=False, or simply restart the mitaka nova-compute service, it's going to fail with a ServiceTooOld exception. The only way out of that is (1) modify the flag in the database directly or (2) upgrade the compute to newton (in this example) and restart it.

We should add information about this to the compute API reference so that operators have a better understanding of what forced-down vs service disable means and in what cases you'd use them.

Changed in nova:
assignee: nobody → Takashi NATSUME (natsume-takashi)
status: Confirmed → In Progress
Revision history for this message
Sean Dague (sdague) wrote :

Automatically discovered version liberty in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.liberty
Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing the status back to the previous state and unassigning. If there are active reviews related to this bug, please include links in comments.

Changed in nova:
status: In Progress → Confirmed
assignee: Takashi NATSUME (natsume-takashi) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/492533

Changed in nova:
assignee: nobody → Sean Dague (sdague)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/492533

Changed in nova:
assignee: Sean Dague (sdague) → melanie witt (melwitt)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/492533
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8835198b8d09e9a69ea83741fdb1579a98019b51
Submitter: Zuul
Branch: master

commit 8835198b8d09e9a69ea83741fdb1579a98019b51
Author: Sean Dague <email address hidden>
Date: Thu Aug 10 09:34:13 2017 -0400

    Update api-guide and api-ref to be clear about forced-down

    Closes-Bug: #1691871
    Related-Bug: #1784826

    Change-Id: Ifc6f1549d88a1b7d9f6e25c962c8a15dd8e180fb

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.