Reschedule after the late affinity check fails with "'NoneType' object is not iterable"

Bug #1719730 reported by melanie witt
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
melanie witt
Newton
Fix Committed
High
Matt Riedemann
Ocata
Fix Committed
High
Matt Riedemann
Pike
Fix Committed
High
Matt Riedemann

Bug Description

Ran into this while hacking on something locally and running the server groups functional tests:

==============================
Failed 1 tests - output below:
==============================

nova.tests.functional.test_server_group.ServerGroupTestV215.test_rebuild_with_anti_affinity
-------------------------------------------------------------------------------------------

Captured pythonlogging:
~~~~~~~~~~~~~~~~~~~~~~~
19:45:29,525 ERROR [nova.scheduler.utils] Error from last host: host2 (node host2): ['Traceback (most recent call last):\n', ' File "nova/compute/manager.py", line 1831, in _do_build_and_run_instance\n filter_properties)\n', ' File "nova/compute/manager.py", line 2061, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', 'RescheduledException: Build of instance c249e39f-0d38-40ce-860d-6c72cdeba436 was re-scheduled: Build of instance c249e39f-0d38-40ce-860d-6c72cdeba436 was re-scheduled: Anti-affinity instance group policy was violated.\n']
19:45:29,526 WARNING [nova.scheduler.utils] Failed to compute_task_build_instances: 'NoneType' object is not iterable
19:45:29,527 WARNING [nova.scheduler.utils] Setting instance to ERROR state.

Two instances are being booted simultaneously and both land on the same host, so the second one will fail the late affinity check and raise a RescheduledException to be rescheduled to another host. But conductor fails to do that because the 'group_members' key doesn't exist in filter_properties and an attempt to make a list out of it fails [1].

In the past, code [2] was added 'group_members' to filter_properties to handle affinity and a more recent change removed most of it but missed 'group_members' [3]. So nothing is ever setting filter_properties['group_members'] but RequestSpec.from_primitives() expects it to be there and blows up trying to make a list from None.

[1] https://github.com/openstack/nova/blob/ad6d339/nova/objects/request_spec.py#L205
[2] https://review.openstack.org/#/c/148277
[3] https://review.openstack.org/#/c/469037

Revision history for this message
Matt Riedemann (mriedem) wrote :

Since https://review.openstack.org/#/c/469037 was made in pike, this is a regression in the pike release.

Changed in nova:
status: New → Confirmed
importance: Undecided → High
tags: added: affinity requestspec reschedule server-groups
melanie witt (melwitt)
Changed in nova:
assignee: nobody → melanie witt (melwitt)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/507938

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/507938
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9d6632a67d91fb3c5145c14ac38011e919d6d8c0
Submitter: Jenkins
Branch: master

commit 9d6632a67d91fb3c5145c14ac38011e919d6d8c0
Author: melanie witt <email address hidden>
Date: Wed Sep 27 17:27:56 2017 +0000

    Set group_members when converting to legacy request spec

    In Pike we converted the affinity filter code to use the RequestSpec
    object instead of legacy dicts. The filter used to populate server
    group info in the filter_properties and the conversion removed that.
    However, in the conductor, we are still converting RequestSpec back
    and forth between object and primitive, and there is a mismatch
    between the keys being set/get in filter_properties. So during a
    reschedule with a server group, we hit an exception
    "'NoneType' object is not iterable" in the RequestSpec.from_primitives
    method and the reschedule fails.

    This adds 'group_members' to the _to_legacy_group_info method to set
    the key.

    Closes-Bug: #1719730

    Change-Id: Icb418f2be575bb2ba82756fdeb67b24a28950746

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/509766

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/509766
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d288132dca7cc76dfc6679eda17bb8fcc62577de
Submitter: Jenkins
Branch: stable/pike

commit d288132dca7cc76dfc6679eda17bb8fcc62577de
Author: melanie witt <email address hidden>
Date: Wed Sep 27 17:27:56 2017 +0000

    Set group_members when converting to legacy request spec

    In Pike we converted the affinity filter code to use the RequestSpec
    object instead of legacy dicts. The filter used to populate server
    group info in the filter_properties and the conversion removed that.
    However, in the conductor, we are still converting RequestSpec back
    and forth between object and primitive, and there is a mismatch
    between the keys being set/get in filter_properties. So during a
    reschedule with a server group, we hit an exception
    "'NoneType' object is not iterable" in the RequestSpec.from_primitives
    method and the reschedule fails.

    This adds 'group_members' to the _to_legacy_group_info method to set
    the key.

    Closes-Bug: #1719730

    Change-Id: Icb418f2be575bb2ba82756fdeb67b24a28950746
    (cherry picked from commit 9d6632a67d91fb3c5145c14ac38011e919d6d8c0)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b1

This issue was fixed in the openstack/nova 17.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.2

This issue was fixed in the openstack/nova 16.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/517860

Revision history for this message
Matt Riedemann (mriedem) wrote :

Bug 1675676 has the same failure but from a different scenario, and goes back to at least Newton.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/517868

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/517860
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=96ad6043bbacf47b29d46288ca0d755541eb1d0e
Submitter: Zuul
Branch: stable/ocata

commit 96ad6043bbacf47b29d46288ca0d755541eb1d0e
Author: melanie witt <email address hidden>
Date: Wed Sep 27 17:27:56 2017 +0000

    Set group_members when converting to legacy request spec

    In Pike we converted the affinity filter code to use the RequestSpec
    object instead of legacy dicts. The filter used to populate server
    group info in the filter_properties and the conversion removed that.
    However, in the conductor, we are still converting RequestSpec back
    and forth between object and primitive, and there is a mismatch
    between the keys being set/get in filter_properties. So during a
    reschedule with a server group, we hit an exception
    "'NoneType' object is not iterable" in the RequestSpec.from_primitives
    method and the reschedule fails.

    This adds 'group_members' to the _to_legacy_group_info method to set
    the key.

    Closes-Bug: #1719730

    NOTE(mriedem): In Ocata, the DiskFilter is still enabled by default
    even though the FilterScheduler is using Placement and filtering
    resources by DISK_GB inventory, which makes the functional test fail.
    So in this backport, the enabled_filters are specifically set without
    the RamFilter and DiskFilter since Placement handles those.

    Change-Id: Icb418f2be575bb2ba82756fdeb67b24a28950746
    (cherry picked from commit 9d6632a67d91fb3c5145c14ac38011e919d6d8c0)
    (cherry picked from commit d288132dca7cc76dfc6679eda17bb8fcc62577de)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.8

This issue was fixed in the openstack/nova 15.0.8 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/517868
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6b7b4aa4a5d19a648556a9076f61f4e87f4710e0
Submitter: Zuul
Branch: stable/newton

commit 6b7b4aa4a5d19a648556a9076f61f4e87f4710e0
Author: melanie witt <email address hidden>
Date: Wed Sep 27 17:27:56 2017 +0000

    Set group_members when converting to legacy request spec

    In Pike we converted the affinity filter code to use the RequestSpec
    object instead of legacy dicts. The filter used to populate server
    group info in the filter_properties and the conversion removed that.
    However, in the conductor, we are still converting RequestSpec back
    and forth between object and primitive, and there is a mismatch
    between the keys being set/get in filter_properties. So during a
    reschedule with a server group, we hit an exception
    "'NoneType' object is not iterable" in the RequestSpec.from_primitives
    method and the reschedule fails.

    This adds 'group_members' to the _to_legacy_group_info method to set
    the key.

    Closes-Bug: #1719730

    NOTE(mriedem): There are a few changes for Newton because of config
    option renames in Ocata and the PlacementFixture didn't exist in
    Newton, and we have to be explicit about running Neutron.

    Depends-On: I344d8fdded9b7d5385fcb41b699f1352acb4cda7

    Change-Id: Icb418f2be575bb2ba82756fdeb67b24a28950746
    (cherry picked from commit 9d6632a67d91fb3c5145c14ac38011e919d6d8c0)
    (cherry picked from commit d288132dca7cc76dfc6679eda17bb8fcc62577de)
    (cherry picked from commit 2e25f689a910e1adedc1687792930aab5cc14ca9)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.10

This issue was fixed in the openstack/nova 14.0.10 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.