Failed to compute_task_build_instances: local variable 'sibling_set' referenced before assignment

Bug #1821733 reported by Stephen Finucane
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
Queens
Fix Committed
Medium
Stephen Finucane

Bug Description

Reproduced from rhbz#1686511 (https://bugzilla.redhat.com/show_bug.cgi?id=1686511)

When spawning an Openstack instance, this error is received:

    2019-03-07 08:07:38.499 3124 WARNING nova.scheduler.utils [req-e577cf31-7a58-420f-8ba5-3f962569ab08 0c90c8d8b42c42e883d2135cc733cac4 8b869a98a43e4fc48001e0ff6d149fe6 - - -] Failed to compute_task_build_instances: local variable 'sibling_set' referenced before assignment
    Traceback (most recent call last):

      File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming
        res = self.dispatcher.dispatch(message)

      File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch
        return self._do_dispatch(endpoint, method, ctxt, args)

      File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch
        result = func(ctxt, **new_args)

      File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 199, in inner
        return func(*args, **kwargs)

      File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 104, in select_destinations
        dests = self.driver.select_destinations(ctxt, spec_obj)

      File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 53, in select_destinations
        selected_hosts = self._schedule(context, spec_obj)

      File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 113, in _schedule
        spec_obj, index=num)

      File "/usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py", line 576, in get_filtered_hosts
        hosts, spec_obj, index)

      File "/usr/lib/python2.7/site-packages/nova/filters.py", line 89, in get_filtered_objects
        list_objs = list(objs)

      File "/usr/lib/python2.7/site-packages/nova/filters.py", line 44, in filter_all
        if self._filter_one(obj, spec_obj):

      File "/usr/lib/python2.7/site-packages/nova/scheduler/filters/__init__.py", line 44, in _filter_one
        return self.host_passes(obj, spec)

      File "/usr/lib/python2.7/site-packages/nova/scheduler/filters/numa_topology_filter.py", line 123, in host_passes
        pci_stats=host_state.pci_stats))

      File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 1297, in numa_fit_instance_to_host
        host_cell, instance_cell, limits)

      File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 906, in _numa_fit_instance_cell
        host_cell, instance_cell)

      File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 854, in _numa_fit_instance_cell_with_pinning
        max(map(len, host_cell.siblings)))

      File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 805, in _pack_instance_onto_cores
        itertools.chain(*sibling_set)))

    UnboundLocalError: local variable 'sibling_set' referenced before assignment

    2019-03-07 08:07:38.500 3124 WARNING nova.scheduler.utils [req-e577cf31-7a58-420f-8ba5-3f962569ab08 0c90c8d8b42c42e883d2135cc733cac4 8b869a98a43e4fc48001e0ff6d149fe6 - - -] [instance: 5bca186a-5a36-4b0f-8b7a-f2f3bc168b29] Setting instance to ERROR state.

This issues appears to be because of:

https://github.com/openstack/nova/blob/da9f9c962fe00dbfc9c8fe9c47e964816d67b773/nova/virt/hardware.py#L875

This works normally because of loop variables in Python are available outside of the scope of the loop:

    >>> for x in range(5):
    ... pass
    ...
    >>> print(x)
    4

and because there's usually something in sibling_sets. However, this is presumably failing for this user because there are no free cores at all on the given host. This is likely the race condition between the nova-scheduler and nova-compute services.

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/647831

Matt Riedemann (mriedem)
Changed in nova:
status: New → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/647831
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=50a0c6a4b98a3ef1d893b0bb204a4a4868017cbb
Submitter: Zuul
Branch: stable/queens

commit 50a0c6a4b98a3ef1d893b0bb204a4a4868017cbb
Author: Stephen Finucane <email address hidden>
Date: Tue Mar 26 13:32:56 2019 +0000

    [Stable Only] hardware: Handle races during pinning

    Due to how we do claiming of pinned CPUs and related NUMA "things", it's
    possible for claims to race. This raciness is usually not an issue since
    pinning with fail for the losing instance, which will just get
    rescheduled. This does mean that it's possible for an instance to land
    on a host with no CPUs at all though and this edge case is triggering a
    nasty bug made possible by Python's unusual scoping rules around for
    loops.

        >>> x = 5
        >>> for y in range(x):
        ... pass
        ...
        >>> print(y)
        4

    'y' would be considered out of scope in the above for most other
    languages (JS and its even dumber scoping rules aside, I guess) and it
    leaves us with situations where the variable might never exist, i.e. the
    bug at hand:

        >>> x = 0
        >>> for y in range(x):
        ... pass
        ...
        >>> print(y)
        Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        NameError: name 'y' is not defined

    Resolve this by adding a check to handle the "no CPUs at all" case and
    quick fail but also remove the reliance on this quirk of Python.

    This doesn't apply to stable/rocky since the issue was inadvertently
    resolved by changes I8982ab25338969cd98621f79b7fbec8af43d12c5 and
    I021ce59048d6292055af49457ba642022f648c81. However, those changes are
    significantly larger and backports have been previously rejected [1][2].

    [1] https://review.openstack.org/#/c/588570/
    [2] https://review.openstack.org/#/c/588571/

    Change-Id: I6afc3af9f13e3c1cc312112eb28eb6e10d2a9e07
    Signed-off-by: Stephen Finucane <email address hidden>
    Closes-Bug: #1821733

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.11

This issue was fixed in the openstack/nova 17.0.11 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.