Spurious warnings in compute logs while building/unshelving an instance: Instance cf1dc8a6-48fe-42fd-90a7-d352c58e1454 is not being actively managed by this compute host but has allocations referencing this compute host: {u'resources': {u'VCPU': 1, u'MEMORY_MB': 64}}. Skipping heal of allocation because we do not know what to do.

Bug #1838541 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Queens
Confirmed
Medium
Unassigned
Rocky
Confirmed
Medium
Unassigned
Stein
Fix Committed
Medium
Matt Riedemann

Bug Description

This warning log from the ResourceTracker is logged quite a bit in CI runs:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22is%20not%20being%20actively%20managed%20by%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d

2601 hits in 7 days.

Looking at one of these the warning shows up while spawning the instance during an unshelve operation. This is a possible race for the rt.instance_claim call because the instance.host/node are set here:

https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L208

before the instance would be added to the rt.tracked_instances dict started here:

https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L217

If the update_available_resource periodic task runs between those times, we'll call _remove_deleted_instances_allocations with the instance and it will have allocations on the node, created by the scheduler, but may not be in tracked_instances yet so we don't short-circuit here:

https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1339

And hit the log condition here:

https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1397

We should probably downgrade that warning to DEBUG if the instance task_state is set since clearly the instance is undergoing some state transition. We should log the task_state and only log the message as a warning if the instance does not have a task_state set but is also not tracked on the host.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Technically this goes back to Pike but I'm not sure we care about fixing it there at this point since Pike is in Extended Maintenance mode upstream. Someone can backport it to stable/pike if they care to.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/673873

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/673873
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ad4ef5af8bf7dc34caef21b0062ef9cc504bc216
Submitter: Zuul
Branch: master

commit ad4ef5af8bf7dc34caef21b0062ef9cc504bc216
Author: Matt Riedemann <email address hidden>
Date: Wed Jul 31 12:41:00 2019 -0400

    rt: soften warning case in _remove_deleted_instances_allocations

    During an instance_claim during initial server create or unshelve,
    the instance.host/node values will be set before the resource
    tracker has the instance in the tracked_instances dict. If
    _remove_deleted_instances_allocations is called with the instance
    before it's being tracked, a warning like this is logged:

      Jul 31 13:12:57.455904 ubuntu-bionic-rax-iad-0009534722
      nova-compute[31951]: WARNING nova.compute.resource_tracker
      [None req-d6f2ae97-d8f7-46f6-8974-b42aeb58302d None None]
      Instance 227c23cd-aeb2-4b5a-b001-21bd920a5e77 is not being actively
      managed by this compute host but has allocations referencing this
      compute host: {u'resources': {u'MEMORY_MB': 64, u'VCPU': 1,
      u'DISK_GB': 1}}. Skipping heal of allocation because we do not know
      what to do.

    This shows up quite frequently in CI runs (see the bug report for
    a logstash query) which means it should not be a warning.

    This change checks the instance task_state and if set then we only
    log a debug message rather than the warning since we can assume we
    are racing and the task will correct itself upon completion.

    Change-Id: I6db8bea6761b68c39e6332d4698d1f8312863758
    Closes-Bug: #1838541

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/679519

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.0.0.0rc1

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/679519
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7e074ad0307c267ee9bb8e006394393160a705c9
Submitter: Zuul
Branch: stable/stein

commit 7e074ad0307c267ee9bb8e006394393160a705c9
Author: Matt Riedemann <email address hidden>
Date: Wed Jul 31 12:41:00 2019 -0400

    rt: soften warning case in _remove_deleted_instances_allocations

    During an instance_claim during initial server create or unshelve,
    the instance.host/node values will be set before the resource
    tracker has the instance in the tracked_instances dict. If
    _remove_deleted_instances_allocations is called with the instance
    before it's being tracked, a warning like this is logged:

      Jul 31 13:12:57.455904 ubuntu-bionic-rax-iad-0009534722
      nova-compute[31951]: WARNING nova.compute.resource_tracker
      [None req-d6f2ae97-d8f7-46f6-8974-b42aeb58302d None None]
      Instance 227c23cd-aeb2-4b5a-b001-21bd920a5e77 is not being actively
      managed by this compute host but has allocations referencing this
      compute host: {u'resources': {u'MEMORY_MB': 64, u'VCPU': 1,
      u'DISK_GB': 1}}. Skipping heal of allocation because we do not know
      what to do.

    This shows up quite frequently in CI runs (see the bug report for
    a logstash query) which means it should not be a warning.

    This change checks the instance task_state and if set then we only
    log a debug message rather than the warning since we can assume we
    are racing and the task will correct itself upon completion.

    Change-Id: I6db8bea6761b68c39e6332d4698d1f8312863758
    Closes-Bug: #1838541
    (cherry picked from commit ad4ef5af8bf7dc34caef21b0062ef9cc504bc216)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.1.0

This issue was fixed in the openstack/nova 19.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.