Placement is not aware of disabled compute nodes

Bug #1805984 reported by Belmiro Moreira on 2018-11-30
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Wishlist
Matt Riedemann

Bug Description

Placement doesn't know if a resource provider (in this particular case a compute node) is disabled. This is only filtered by the scheduler using the "ComputeFilter".

However, when using the option "max_placement_results" to restrict the amount of placement results there is the possibility to get only "disabled" allocation candidates from placement. The creation of new VMs will end up in ERROR because there are "No Valid Hosts".

There are several use-cases when an operator may want to disable nodes to avoid the creation of new VMs.

Related with: https://bugs.launchpad.net/nova/+bug/1708958

sean mooney (sean-k-mooney) wrote :

knowledge of the status of openstack service is currently not within the scope of placement to track.
it is undesirable however to get a No Valid host from the nova api due to the presence of downed compute nodes in the placement response.

i have triage this as whishlist as i think this is functioning as desinged but this is an unfortunate edgecase taht would
be better addressed by a spec or a specless blueprint.

i think its unlikely that you would have enough hosts marked as down for this to typically be an issue but in large deployments
this becomes more likely.

Changed in nova:
importance: Undecided → Wishlist
status: New → Triaged
Matt Riedemann (mriedem) wrote :
tags: added: cells scheduler

Fix proposed to branch: master
Review: https://review.opendev.org/654596

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Triaged → In Progress
Changed in nova:
assignee: Matt Riedemann (mriedem) → Eric Fried (efried)
Eric Fried (efried) on 2019-06-29
summary: - Placement is not aware of disable compute nodes
+ Placement is not aware of disabled compute nodes
Changed in nova:
assignee: Eric Fried (efried) → Matt Riedemann (mriedem)

Reviewed: https://review.opendev.org/654596
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3f0605c28987f46cf4a05af0140e2e5de7d5ad0a
Submitter: Zuul
Branch: master

commit 3f0605c28987f46cf4a05af0140e2e5de7d5ad0a
Author: Matt Riedemann <email address hidden>
Date: Tue Jul 2 18:57:38 2019 -0400

    Sync COMPUTE_STATUS_DISABLED from API

    This adds the os-services API change which will
    call the compute service when the service's disabled
    value changes to sync the COMPUTE_STATUS_DISABLED trait
    on the compute node resource providers managed by the
    updated compute service.

    If the compute service is down or not yet upgraded to
    the service version from change
    Ia95de2f23f12b002b2189c9294ec190569a628ab then the
    API will not call the service. In this case the change
    from I3005b46221ac3c0e559e1072131a7e4846c9867c will
    sync the trait when the compute service is restarted.

    Since the compute service could be running the ironic
    driver and managing hundreds or over 1000 compute nodes,
    the set_host_enabled RPC call now uses the long_rpc_timeout
    configuration option.

    A functional test is added which covers the 2.53+
    PUT /os-services/{service_id} API and pre-2.53 os-services
    APIs for enabling/disabling and forcing down a service.
    The functional test also covers the sync-on-restart behavior
    from change I3005b46221ac3c0e559e1072131a7e4846c9867c.
    The scheduler pre-filter added in change
    I317cabbe49a337848325f96df79d478fd65811d9 is also tested
    as part of the functional test.

    Closes-Bug: #1805984

    Implements blueprint pre-filter-disabled-computes

    Change-Id: If32bca070185937ef83f689b7163d965a89ec10a

Changed in nova:
status: In Progress → Fix Released

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related blueprints