ServersOnMultiNodesTest.test_create_server_with_scheduler_hint_group_anti_affinity failing with "Servers are on the same host"

Bug #1781710 reported by Matt Riedemann
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann

Bug Description

Started seeing this recently which looks like a regression:

http://logs.openstack.org/44/564444/14/check/neutron-tempest-multinode-full/dba40b9/job-output.txt.gz#_2018-07-13_19_53_15_275866

2018-07-13 19:53:15.275866 | primary | {1} tempest.api.compute.admin.test_servers_on_multinodes.ServersOnMultiNodesTest.test_create_server_with_scheduler_hint_group_anti_affinity [7.164074s] ... FAILED
2018-07-13 19:53:15.275944 | primary |
2018-07-13 19:53:15.276012 | primary | Captured traceback:
2018-07-13 19:53:15.276075 | primary | ~~~~~~~~~~~~~~~~~~~
2018-07-13 19:53:15.276171 | primary | Traceback (most recent call last):
2018-07-13 19:53:15.276452 | primary | File "tempest/api/compute/admin/test_servers_on_multinodes.py", line 115, in test_create_server_with_scheduler_hint_group_anti_affinity
2018-07-13 19:53:15.276598 | primary | 'Servers are on the same host: %s' % hosts)
2018-07-13 19:53:15.276857 | primary | File "/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/unittest2/case.py", line 845, in assertNotEqual
2018-07-13 19:53:15.276965 | primary | raise self.failureException(msg)
2018-07-13 19:53:15.277830 | primary | AssertionError: u'ubuntu-xenial-rax-dfw-0000714118' == u'ubuntu-xenial-rax-dfw-0000714118' : Servers are on the same host: {u'c166e283-477c-4ecf-9c1c-2dcd731a6d6a': u'ubuntu-xenial-rax-dfw-0000714118', u'6eb63e79-122e-45f9-931f-0750047116d1': u'ubuntu-xenial-rax-dfw-0000714118'}

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AssertionError%5C%22%20AND%20message%3A%5C%22Servers%20are%20on%20the%20same%20host%3A%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d

According to logstash, it looks like this started around July 9.

These changes merged to nova on July 9 but shouldn't impact this scheduling behavior yet:

https://github.com/openstack/nova/commit/57b0bb374963bdbf0aef910feaccb8f536641c41

https://github.com/openstack/nova/commit/afc7650e64753ab7687ae2c4f2714d4bb78a4e5a

Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (5.4 KiB)

This is where we first hit the server group anti affinity filter during scheduling for this request to create 2 servers in the same anti-affinity group:

http://logs.openstack.org/44/564444/14/check/neutron-tempest-multinode-full/dba40b9/logs/screen-n-sch.txt.gz#_Jul_13_19_53_09_805696

Jul 13 19:53:09.805696 ubuntu-xenial-rax-dfw-0000714118 nova-scheduler[3485]: DEBUG nova.scheduler.filters.affinity_filter [None req-8b191d5b-4c76-47c4-a322-d9c9a5ee6d0b tempest-ServersOnMultiNodesTest-1141629396 tempest-ServersOnMultiNodesTest-1141629396] Group anti-affinity: check if the number of servers from group 243c3452-1fde-41ef-bf5a-1cddf1236a7f on host ubuntu-xenial-rax-dfw-0000714124 is less than 1. {{(pid=5417) host_passes /opt/stack/new/nova/nova/scheduler/filters/affinity_filter.py:122}}
Jul 13 19:53:09.806055 ubuntu-xenial-rax-dfw-0000714118 nova-scheduler[3485]: DEBUG nova.scheduler.filters.affinity_filter [None req-8b191d5b-4c76-47c4-a322-d9c9a5ee6d0b tempest-ServersOnMultiNodesTest-1141629396 tempest-ServersOnMultiNodesTest-1141629396] Group anti-affinity: check if the number of servers from group 243c3452-1fde-41ef-bf5a-1cddf1236a7f on host ubuntu-xenial-rax-dfw-0000714118 is less than 1. {{(pid=5417) host_passes /opt/stack/new/nova/nova/scheduler/filters/affinity_filter.py:122}}
Jul 13 19:53:09.808871 ubuntu-xenial-rax-dfw-0000714118 nova-scheduler[3485]: DEBUG nova.filters [None req-8b191d5b-4c76-47c4-a322-d9c9a5ee6d0b tempest-ServersOnMultiNodesTest-1141629396 tempest-ServersOnMultiNodesTest-1141629396] Filter ServerGroupAntiAffinityFilter returned 2 host(s) {{(pid=5417) get_filtered_objects /opt/stack/new/nova/nova/filters.py:104}}

This is where we hit the server group anti-affinity filter for the second instance:

http://logs.openstack.org/44/564444/14/check/neutron-tempest-multinode-full/dba40b9/logs/screen-n-sch.txt.gz#_Jul_13_19_53_09_917947

Jul 13 19:53:09.917947 ubuntu-xenial-rax-dfw-0000714118 nova-scheduler[3485]: DEBUG nova.scheduler.filters.affinity_filter [None req-8b191d5b-4c76-47c4-a322-d9c9a5ee6d0b tempest-ServersOnMultiNodesTest-1141629396 tempest-ServersOnMultiNodesTest-1141629396] Group anti-affinity: check if the number of servers from group 243c3452-1fde-41ef-bf5a-1cddf1236a7f on host ubuntu-xenial-rax-dfw-0000714118 is less than 1. {{(pid=5417) host_passes /opt/stack/new/nova/nova/scheduler/filters/affinity_filter.py:122}}
Jul 13 19:53:09.918313 ubuntu-xenial-rax-dfw-0000714118 nova-scheduler[3485]: DEBUG nova.scheduler.filters.affinity_filter [None req-8b191d5b-4c76-47c4-a322-d9c9a5ee6d0b tempest-ServersOnMultiNodesTest-1141629396 tempest-ServersOnMultiNodesTest-1141629396] Group anti-affinity: check if the number of servers from group 243c3452-1fde-41ef-bf5a-1cddf1236a7f on host ubuntu-xenial-rax-dfw-0000714124 is less than 1. {{(pid=5417) host_passes /opt/stack/new/nova/nova/scheduler/filters/affinity_filter.py:122}}
Jul 13 19:53:09.918709 ubuntu-xenial-rax-dfw-0000714118 nova-scheduler[3485]: DEBUG nova.filters [None req-8b191d5b-4c76-47c4-a322-d9c9a5ee6d0b tempest-ServersOnMultiNodesTest-1141629396 tempest-ServersOnMultiNodesTest-1141629396] Filter ServerGroupAntiAffinityFilter returned 2 host(s) {{(pid=5417) g...

Read more...

Revision history for this message
Yikun Jiang (yikunkero) wrote :

> I don't really see what could be an issue there, but we should pick a host for the first instance, "claim" resources on that host and update the group hosts, and then run the filters on the second instance and filter out the first host selected for the first instance.

Yes, it seems we need refresh host_state.instances[1] and spec_obj.instance_group.members[2] in claim for next server.
[1] https://review.openstack.org/#/c/571166/27/nova/scheduler/filters/affinity_filter.py@101
[2] https://review.openstack.org/#/c/571166/27/nova/scheduler/filters/affinity_filter.py@103

But my confusion is: Why compute didn't check the 2nd server as a overhead server in _validate_instance_group_policy of compute manager[3].

[3] https://review.openstack.org/#/c/571465/26/nova/compute/manager.py@1293

Revision history for this message
Matt Riedemann (mriedem) wrote :

> Yes, it seems we need refresh host_state.instances[1] and spec_obj.instance_group.members[2] in
> claim for next server.
> [1] https://review.openstack.org/#/c/571166/27/nova/scheduler/filters/affinity_filter.py@101
> [2] https://review.openstack.org/#/c/571166/27/nova/scheduler/filters/affinity_filter.py@103

HostState.instances should be updated when the FilterScheduler calls HostState.consume_from_request:

https://github.com/openstack/nova/blob/21a368e1a6f22aa576719ec463d13280b9178f10/nova/scheduler/filter_scheduler.py#L324

However, looking at this it doesn't look like it does:

https://github.com/openstack/nova/blob/21a368e1a6f22aa576719ec463d13280b9178f10/nova/scheduler/host_manager.py#L276

And the InstanceGroup.members should already have both instances in the members list by the time we get to the scheduler because they are added to the group in the API:

https://github.com/openstack/nova/blob/21a368e1a6f22aa576719ec463d13280b9178f10/nova/compute/api.py#L948

> But my confusion is: Why compute didn't check the 2nd server as a overhead server in
> _validate_instance_group_policy of compute manager[3].
> [3] https://review.openstack.org/#/c/571465/26/nova/compute/manager.py@1293

That's because it requires an up-call to the API DB to get the group information and by default configuration, devstack disables that via this config option:

https://docs.openstack.org/nova/latest/configuration/config.html#workarounds.disable_group_policy_check_upcall

Revision history for this message
Matt Riedemann (mriedem) wrote :

I think it's pretty clear at this point that the issue is we stopped checking the instance_group.hosts in the ServerGroupAntiAffinityFilter:

        group_hosts = (spec_obj.instance_group.hosts
                       if spec_obj.instance_group else [])
        LOG.debug("Group anti affinity: check if %(host)s not "
                  "in %(configured)s", {'host': host_state.host,
                                        'configured': group_hosts})
        if group_hosts:
            return host_state.host not in group_hosts

And we are now relying on this:

        # The list of instances UUIDs on the given host
        instances = set(host_state.instances.keys())

Which doesn't get set within the same request when we call HostState.consume_from_request, and that introduces the race.

Matt Riedemann (mriedem)
Changed in nova:
status: Confirmed → Triaged
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/582976

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/582976
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ef2949900227e16e0a9494df3adec25d7dae6a14
Submitter: Zuul
Branch: master

commit ef2949900227e16e0a9494df3adec25d7dae6a14
Author: Matt Riedemann <email address hidden>
Date: Mon Jul 16 11:29:57 2018 -0400

    Update HostState.instances during _consume_selected_host

    Change I4b67ec9dd4ce846a704d0f75ad64c41e693de0fb changed the
    behavior of the ServerGroupAntiAffinityFilter to rely on
    HostState.instances being accurate within the same request
    so that it can compare the number of group members on the same
    host for filtering. Previously the filter was just checking
    InstanceGroup.hosts which was updated in the
    FilterScheduler._consume_selected_host method for each instance
    processed in a multi-create request.

    This change updates _consume_selected_host such that if we have
    an instance group and an instance_uuid (which should always be
    the case unless conductor is running code older than Pike), then
    we update HostState.instances with the instance_uuid so the
    ServerGroupAntiAffinityFilter can properly compare the number of
    group members against the instances on a given host in a multi-create
    request.

    Change-Id: Iacc636fa8a59a9e8670a8d683c10bdbb0dc8237b
    Closes-Bug: #1781710

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
Matt Riedemann (mriedem) wrote :

It looks like my fix didn't resolve the issue since we're still seeing it:

http://status.openstack.org/elastic-recheck/#1781710

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/583339

Revision history for this message
Matt Riedemann (mriedem) wrote :

(2:14:30 PM) mriedem: i think this is the killer https://github.com/openstack/nova/blob/master/nova/scheduler/filters/affinity_filter.py#L98
(2:15:54 PM) mriedem: when we're doing multi-create, we have 1 request spec per instance, so in this failing test we have 2
(2:15:55 PM) sean-k-mooney: checking if the instance uuid is already in the current host.
(2:16:10 PM) mriedem: but the conductor just randomly picks the first request spec from the list when passing it to the scheduler,
(2:16:15 PM) mriedem: and that request_spec has a random instance_uuid on it
(2:16:41 PM) mriedem: so if i've got hosts A and B and instances X and Y, and using reqspec with instance X,
(2:16:52 PM) mriedem: then go through host filtering and select host A for instance X,
(2:17:11 PM) mriedem: when i go through host filtering for instance Y and host A, it will say: if spec_obj.instance_uuid in host_state.instances.keys():
(2:17:13 PM) sean-k-mooney: oh realy should we not be looping over the requests specs in the multicreate request
(2:17:23 PM) mriedem: if X in A.instances(): return True
(2:17:43 PM) mriedem: the problem is when this code was added https://github.com/openstack/nova/blob/master/nova/scheduler/filters/affinity_filter.py#L98 it assumes resize, which is a single instance,
(2:18:05 PM) mriedem: and it worked because before https://review.openstack.org/#/c/571166/ the filter didn't look at host_state.instances within a multi-create create
(2:18:16 PM) mriedem: s/look at/depend on/
(2:19:10 PM) mriedem: i'm going to see if i can make this a predictable order through the scheduler to tickle the failure
(2:23:54 PM) sean-k-mooney: ya so that looks wrong https://github.com/openstack/nova/blob/039f7e055ee2b47e96be4e86d6f48d9ce469c123/nova/conductor/manager.py#L1191-L1194
(2:24:14 PM) sean-k-mooney: its using the request spec from the first instance only
(2:24:30 PM) mriedem: well, it's not wrong, it's just very brittle assumptions in the anti-affinity filter that didn't used to be a problem

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/583351

Revision history for this message
Matt Riedemann (mriedem) wrote :

Hmm, I see that before scheduling, conductor pulls the instance group from the DB again to refresh it's members list and it sets the policies on it again, but the filter is acting on the 'policy' field, so is it possible that we're just not setting the policy field properly during scheduling?

osboxes@osboxes:~/git/nova$ grep -R spec.instance_group.policies
nova/scheduler/utils.py: request_spec.instance_group.policies = group_info.policies

Revision history for this message
Matt Riedemann (mriedem) wrote :

Ignore my comment 11, it shouldn't be an issue because when we pull the InstanceGroup out of the database we set both the policy and policies field.

Matt Riedemann (mriedem)
Changed in nova:
status: Fix Released → In Progress
tags: added: rocky-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/583339
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5e262d673b9ec20c33060d2d2ebf9f191f185bcd
Submitter: Zuul
Branch: master

commit 5e262d673b9ec20c33060d2d2ebf9f191f185bcd
Author: Matt Riedemann <email address hidden>
Date: Tue Jul 17 14:15:40 2018 -0400

    Add regression test for bug 1781710

    Co-Authored-By: Sean Mooney <email address hidden>

    Related-Bug: #1781710
    Change-Id: I411eb17524a44da4f765fd1463e404dc15075e2c

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/583351

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)
Download full text (3.2 KiB)

Reviewed: https://review.openstack.org/583347
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c22b53c2481bac518a6b32cdee7b7df23d91251e
Submitter: Zuul
Branch: master

commit c22b53c2481bac518a6b32cdee7b7df23d91251e
Author: Matt Riedemann <email address hidden>
Date: Tue Jul 17 17:43:37 2018 -0400

    Update RequestSpec.instance_uuid during scheduling

    Before change I4b67ec9dd4ce846a704d0f75ad64c41e693de0fb
    the ServerGroupAntiAffinityFilter did not rely on the
    HostState.instances dict to determine **within the same
    request** how many members of the same anti-affinity
    group were on a given host. In fact, at that time, the
    HostState.instances dict wasn't updated between filter
    runs in the same multi-create request. That was fixed with
    change Iacc636fa8a59a9e8670a8d683c10bdbb0dc8237b so that
    as we select a host for each group member being created
    within a single request, we also update the HostState.instances
    dict so the ServerGroupAntiAffinityFilter would have
    accurate tracking of which group members were on which
    hosts.

    However, that did not account for a wrinkle in the filter
    added in change Ie016f59f5b98bb9c70b3e33556bd747f79fc77bd
    which is needed to allow resizing a server to the same host
    when that server is in an anti-affinity group. That wrinkle,
    combined with the fact the RequestSpec the filter is acting
    upon for a given instance in a multi-create request might
    not actually have the same instance_uuid can cause the filter
    to think it's in a resize situation and accept a host which
    already has another member from the same anti-affinity group
    on it, which breaks the anti-affinity policy.

    For background, during a multi-create request, we create a
    RequestSpec per instance being created, but conductor only
    sends the first RequestSpec for the first instance to the
    scheduler. This means RequestSpec.num_instances can be >1
    and we can be processing the Nth instance in the list during
    scheduling but working on a RequestSpec for the first instance.
    That is what breaks the resize check in ServerGroupAntiAffinityFilter
    with regard to multi-create.

    To resolve this, we update the RequestSpec.instance_uuid when
    filtering hosts for a given instance but we don't persist that
    change to the RequestSpec.

    This is a bit clunky, but it kind of comes with the territory of
    how we hack scheduling requests together using a single RequestSpec
    for multi-create requests. An alternative to this is found in change
    I0dd1fa5a70ac169efd509a50b5d69ee5deb8deb7 where we rely on the
    RequestSpec.num_instances field to determine if we're in a multi-create
    situation, but that has its own flaws because num_instances is
    persisted with the RequestSpec which might cause us to re-introduce
    bug 1558532 if we're not careful. In that case we'd have to either
    (1) stop persisting RequestSpec.num_instances or (2) temporarily
    unset it like we do using RequestSpec.reset_forced_destinations()
    during move operations.

    Co-Authored-By: Sean Mooney <work@seanmoon...

Read more...

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b3

This issue was fixed in the openstack/nova 18.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.