disabling a compute service does not disable the resource provider

Bug #1708958 reported by Chris Dent
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Won't Fix
Wishlist
Unassigned

Bug Description

If you make a multi node devstack (nova master as of August 6th, 2017), or otherwise have multiple compute nodes, all of those compute nodes will create resource providers and relevant inventory.

Later if you disable one of the compute nodes with a nova service-disable {service id}, that nova-compute will be disabled, but the associated resource provider will still exist with legit inventory in the placement service.

This will mean that /allocation_candidates or /resource_providers will return results including the disabled compute node, but they will be bogus.

It's not clear what the right behaviour is here. Should the rp of the disabled service be deleted? Have its inventory truncated? If there are other hosts available that satisfy the request, things go forward as desired, so there's not a functional bug here, but the data in placement is incorrect, which is undesirable.

(On a related note, if you delete a compute node's resource provider from the placement service and don't restart the associated nova-compute, the _ensure_resource_provider method does _not_ create the resource provider anew because the _resource_providers dict still contains the uuid. This might be expected behaviour but it surprised me while I was messing around.)

Tags: placement
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Honestly, I'm not sure Placement should consider the state of the compute services rather just than their freshness.

I'm okay with considering that some inventory data that is old enough (I leave undefined the notion of "old enough") shouldn't really be considered valid for placement operations but I just see the "disabling" thing to be a pure nova truism.

Also, we have ComputeFilter for that exact purpose. Say Placement returns you a whole list of disabled RPs, then ComputeFilter would just kick those out from the ones we should allocate. Not a big deal to me.

Accordingly, leaving "low" as I'm even not sure it's a real bug we want to support, rather a feature about how Placement should consider data staleness.

Changed in nova:
status: New → Confirmed
Revision history for this message
Chris Dent (cdent) wrote :

Gibi rightly points out that when a hypervisor is disabled any vms on it are still in use and still have allocations, so we can't kill the rp or its inventory. Especially if we want allocations to be used for measuring quota use.

twisted webs

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

So, after thinking about that, I think it's okay to discuss about how to provide less RPs to the scheduler if those are stale. That said, I don't think it's really a bug at the moment, just an optimization so putting the bug report to Wishlist.

Changed in nova:
importance: Low → Wishlist
Revision history for this message
Jay Pipes (jaypipes) wrote :

Right, this is actually exactly how the system is intended to work. The scheduler calls placement to determine the providers that have capacity for some workload. Placement doesn't care or need to know about the state of a resource provider's communication link (i.e. the compute service). It's irrelevant. The scheduler calls the servicegroup API to check if the compute service associated with a compute node provider (or providers in the case of Ironic) is disabled or not and removes that provider from any decisions. This is by design. It's a separation of concerns thing.

Revision history for this message
Chris Dent (cdent) wrote :

Just so it's clear: I get that things are working as intended on the nova side, but my concern was that if some other system is using the placement data as a source of truth about available resources _and_ simultaneously all the available hypervisors are disabled that view of "truth" isn't very truthy.

That may not be a problem, but it does seem weird.

Chris Dent (cdent)
Changed in nova:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.