Unshelve results in duplicated resource deallocated

Bug #1587386 reported by Stephen Finucane
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Stephen Finucane
Mitaka
Fix Released
Undecided
Stephen Finucane

Bug Description

Description
===========

Shelve/unshelve operations fail when using "NFV flavors". This was reported on the mailing list initially.

http://lists.openstack.org/pipermail/openstack-dev/2016-May/095631.html

Steps to reproduce
==================

1. Create a flavor with 'hw:numa_nodes=2', 'hw:cpu_policy=dedicated' and 'hw:mempage_size=large'
2. Configure Tempest to use this new flavor
3. Run Tempest tests

Expected result
===============

All tests will pass.

Actual result
=============

The shelve/unshelve Tempest tests always result in a timeout exception
being raised, looking similar to the following, from [1]:

    Traceback (most recent call last):
      File "tempest/api/compute/base.py", line 166, in server_check_teardown
    cls.server_id, 'ACTIVE')
      File "tempest/common/waiters.py", line 95, in wait_for_server_status
        raise exceptions.TimeoutException(message)2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base TimeoutException: Request timed out
    Details: (ServerActionsTestJSON:tearDown) Server cae6fd47-0968-4922-a03e-3f2872e4eb52 failed to reach ACTIVE status and task state "None" within the required time (196 s). Current status: SHELVED_OFFLOADED. Current task state: None.

The following errors are raised in the compute logs:

    Traceback (most recent call last):
      File "/opt/stack/new/nova/nova/compute/manager.py", line 4230, in _unshelve_instance
        with rt.instance_claim(context, instance, limits):
      File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
        return f(*args, **kwargs)
      File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 151, in instance_claim
    self._update_usage_from_instance(context, instance_ref)
      File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 827, in _update_usage_from_instance
        self._update_usage(instance, sign=sign)
      File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 666, in _update_usage
        self.compute_node, usage, free)
      File "/opt/stack/new/nova/nova/virt/hardware.py", line 1482, in get_host_numa_usage_from_instance
        host_numa_topology, instance_numa_topology, free=free))
      File "/opt/stack/new/nova/nova/virt/hardware.py", line 1348, in numa_usage_from_instances
        newcell.unpin_cpus(pinned_cpus)
      File "/opt/stack/new/nova/nova/objects/numa.py", line 94, in unpin_cpus
        pinned=list(self.pinned_cpus))
    CPUPinningInvalid: Cannot pin/unpin cpus [6] from the following pinned set [0, 2, 4]

[1] http://intel-openstack-ci-logs.ovh/86/319686/1/check/tempest-dsvm-full-nfv/b463722/testr_results.html.gz

Environment
===========

1. Exact version of OpenStack you are running. See the following

Commit '25fdf64'.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/323269

Changed in nova:
assignee: nobody → Stephen Finucane (sfinucan)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/323269
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f1320a7c2debf127a93773046adffb80563fd20b
Submitter: Jenkins
Branch: master

commit f1320a7c2debf127a93773046adffb80563fd20b
Author: Stephen Finucane <email address hidden>
Date: Mon May 30 16:03:35 2016 +0100

    Evaluate 'task_state' in resource (de)allocation

    There are two types of VM states associated with shelving. The first,
    'shelved' indicates that the VM has been powered off but the resources
    remain allocated on the hypervisor. The second, 'shelved_offloaded',
    indicates that the VM has been powered off and the resources freed.
    When "unshelving" VMs in the latter state, the VM state does not change
    from 'shelved_offloaded' until some time after the VM has been
    "unshelved".

    Change I83a5f06 introduced a change that allowed for deallocation of
    resources when they were set to the 'shelved_offloaded' state. However,
    the resource (de)allocation code path assumes any VM with a state of
    'shelved_offloaded' should have resources deallocated from it, rather
    than allocated to it. As the VM state has not changed when this code
    path is executed, resources are incorrectly deallocated from the
    instance twice.

    Enhance the aformentioned check to account for task state in addition to
    VM state. This ensures a VM that's still in 'shelved_offloaded' state,
    but is in fact being unshelved, does not trigger deallocation.

    Change-Id: Ie2e7b91937fc3d61bb1197fffc3549bebc65e8aa
    Signed-off-by: Stephen Finucane <email address hidden>
    Resolves-bug: #1587386
    Related-bug: #1545675

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/337107

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/mitaka)

Reviewed: https://review.openstack.org/337107
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2703a3d80bcbd49eaafaae624289d00c521b5192
Submitter: Jenkins
Branch: stable/mitaka

commit 2703a3d80bcbd49eaafaae624289d00c521b5192
Author: Stephen Finucane <email address hidden>
Date: Mon May 30 16:03:35 2016 +0100

    Evaluate 'task_state' in resource (de)allocation

    There are two types of VM states associated with shelving. The first,
    'shelved' indicates that the VM has been powered off but the resources
    remain allocated on the hypervisor. The second, 'shelved_offloaded',
    indicates that the VM has been powered off and the resources freed.
    When "unshelving" VMs in the latter state, the VM state does not change
    from 'shelved_offloaded' until some time after the VM has been
    "unshelved".

    Change I83a5f06 introduced a change that allowed for deallocation of
    resources when they were set to the 'shelved_offloaded' state. However,
    the resource (de)allocation code path assumes any VM with a state of
    'shelved_offloaded' should have resources deallocated from it, rather
    than allocated to it. As the VM state has not changed when this code
    path is executed, resources are incorrectly deallocated from the
    instance twice.

    Enhance the aformentioned check to account for task state in addition to
    VM state. This ensures a VM that's still in 'shelved_offloaded' state,
    but is in fact being unshelved, does not trigger deallocation.

    Change-Id: Ie2e7b91937fc3d61bb1197fffc3549bebc65e8aa
    Signed-off-by: Stephen Finucane <email address hidden>
    Resolves-bug: #1587386
    Related-bug: #1545675
    (cherry picked from commit f1320a7c2debf127a93773046adffb80563fd20b)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.