Spurious warnings in compute logs while building/unshelving an instance: Instance cf1dc8a6-48fe-42fd-90a7-d352c58e1454 is not being actively managed by this compute host but has allocations referencing this compute host: {u'resources': {u'VCPU': 1, u'MEMORY_MB': 64}}. Skipping heal of allocation because we do not know what to do.

Bug #1838541 reported by Matt Riedemann on 2019-07-31
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Matt Riedemann
Queens
Medium
Unassigned
Rocky
Medium
Unassigned
Stein
Medium
Matt Riedemann

Bug Description

This warning log from the ResourceTracker is logged quite a bit in CI runs:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22is%20not%20being%20actively%20managed%20by%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d

2601 hits in 7 days.

Looking at one of these the warning shows up while spawning the instance during an unshelve operation. This is a possible race for the rt.instance_claim call because the instance.host/node are set here:

https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L208

before the instance would be added to the rt.tracked_instances dict started here:

https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L217

If the update_available_resource periodic task runs between those times, we'll call _remove_deleted_instances_allocations with the instance and it will have allocations on the node, created by the scheduler, but may not be in tracked_instances yet so we don't short-circuit here:

https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1339

And hit the log condition here:

https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1397

We should probably downgrade that warning to DEBUG if the instance task_state is set since clearly the instance is undergoing some state transition. We should log the task_state and only log the message as a warning if the instance does not have a task_state set but is also not tracked on the host.

Matt Riedemann (mriedem) wrote :

Technically this goes back to Pike but I'm not sure we care about fixing it there at this point since Pike is in Extended Maintenance mode upstream. Someone can backport it to stable/pike if they care to.

Fix proposed to branch: master
Review: https://review.opendev.org/673873

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Triaged → In Progress

Reviewed: https://review.opendev.org/673873
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ad4ef5af8bf7dc34caef21b0062ef9cc504bc216
Submitter: Zuul
Branch: master

commit ad4ef5af8bf7dc34caef21b0062ef9cc504bc216
Author: Matt Riedemann <email address hidden>
Date: Wed Jul 31 12:41:00 2019 -0400

    rt: soften warning case in _remove_deleted_instances_allocations

    During an instance_claim during initial server create or unshelve,
    the instance.host/node values will be set before the resource
    tracker has the instance in the tracked_instances dict. If
    _remove_deleted_instances_allocations is called with the instance
    before it's being tracked, a warning like this is logged:

      Jul 31 13:12:57.455904 ubuntu-bionic-rax-iad-0009534722
      nova-compute[31951]: WARNING nova.compute.resource_tracker
      [None req-d6f2ae97-d8f7-46f6-8974-b42aeb58302d None None]
      Instance 227c23cd-aeb2-4b5a-b001-21bd920a5e77 is not being actively
      managed by this compute host but has allocations referencing this
      compute host: {u'resources': {u'MEMORY_MB': 64, u'VCPU': 1,
      u'DISK_GB': 1}}. Skipping heal of allocation because we do not know
      what to do.

    This shows up quite frequently in CI runs (see the bug report for
    a logstash query) which means it should not be a warning.

    This change checks the instance task_state and if set then we only
    log a debug message rather than the warning since we can assume we
    are racing and the task will correct itself upon completion.

    Change-Id: I6db8bea6761b68c39e6332d4698d1f8312863758
    Closes-Bug: #1838541

Changed in nova:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers