Server group anti-affinity no longer works

Bug #1863190 reported by Michael Johnson
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

Server group anti-affinity is no longer working, at least in the simple case. I am able to boot two VMs in an anti-affinity server group on a devstack that only has one compute instance. Previously this would fail and/or require soft-anti-affinity enabled.

$ openstack host list
+-----------+-----------+----------+
| Host Name | Service | Zone |
+-----------+-----------+----------+
| devstack2 | scheduler | internal |
| devstack2 | conductor | internal |
| devstack2 | conductor | internal |
| devstack2 | compute | nova |
+-----------+-----------+----------+

$ openstack compute service list
+----+----------------+-----------+----------+---------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+----+----------------+-----------+----------+---------+-------+----------------------------+
| 3 | nova-scheduler | devstack2 | internal | enabled | up | 2020-02-14T00:59:15.000000 |
| 6 | nova-conductor | devstack2 | internal | enabled | up | 2020-02-14T00:59:16.000000 |
| 1 | nova-conductor | devstack2 | internal | enabled | up | 2020-02-14T00:59:19.000000 |
| 3 | nova-compute | devstack2 | nova | enabled | up | 2020-02-14T00:59:17.000000 |
+----+----------------+-----------+----------+---------+-------+----------------------------+

$ openstack server list
+--------------------------------------+----------------------------------------------+--------+-----------------------------------------------+---------------------+------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+----------------------------------------------+--------+-----------------------------------------------+---------------------+------------+
| a44febef-330c-4db5-b220-959cbbff8f8c | amphora-1bc97ddb-80da-446a-bce3-0c867c1fc258 | ACTIVE | lb-mgmt-net=192.168.0.58; public=172.24.4.200 | amphora-x64-haproxy | m1.amphora |
| de776347-0cf4-47d5-bb37-17fb37d79f2e | amphora-433abe98-fd8e-4e4f-ac11-4f76bbfc7aba | ACTIVE | lb-mgmt-net=192.168.0.199; public=172.24.4.11 | amphora-x64-haproxy | m1.amphora |
+--------------------------------------+----------------------------------------------+--------+-----------------------------------------------+---------------------+------------+

$ openstack server group show ddbc8544-c664-4da4-8fd8-32f6bd01e960
+----------+----------------------------------------------------------------------------+
| Field | Value |
+----------+----------------------------------------------------------------------------+
| id | ddbc8544-c664-4da4-8fd8-32f6bd01e960 |
| members | a44febef-330c-4db5-b220-959cbbff8f8c, de776347-0cf4-47d5-bb37-17fb37d79f2e |
| name | octavia-lb-cc40d031-6ce9-475f-81b4-0a6096178834 |
| policies | anti-affinity |
+----------+----------------------------------------------------------------------------+

Steps to reproduce:
1. Boot a devstack.
2. Create an anti-affinity server group.
2. Boot two VMs in that server group.

Expected Behavior:

The second VM boot should fail with an error similar to "not enough hosts"

Actual Behavior:

The second VM boots with no error, The two instances in the server group are on the same host.

Environment:
Nova version (current Ussuri): 0d3aeb0287a0619695c9b9e17c2dec49099876a5
commit 0d3aeb0287a0619695c9b9e17c2dec49099876a5 (HEAD -> master, origin/master, origin/HEAD)
Merge: 1fcd74730d 65825ebfbd
Author: Zuul <email address hidden>
Date: Thu Feb 13 14:25:10 2020 +0000

    Merge "Make RBD imagebackend flatten method idempotent"

Fresh devstack install, however I have another devstack from August that is also showing this behavior.

Revision history for this message
Michael Johnson (johnsom) wrote :

devstack@n-* log files.

Revision history for this message
Michael Johnson (johnsom) wrote :

As was requested in IRC, if I wait until the first instance goes to ACTIVE, the second build will go to ERROR as expected.
This isn't a good workaround for us as waiting for ACTIVE can take up to five minutes in some clouds and we would want to get the secondary instance started much faster than that.

This previously worked as expected, but I don't know exactly when it started allowing two instances on the same host when using "hard" anti-affinity.

Revision history for this message
Adam Harwell (adam-harwell) wrote :

What I see in my cloud is that one of the two will schedule and build, and the other will schedule, but fail to build with a rescheduling error:

```
{'message': 'Build of instance 417e19c2-e2a5-48e0-8ce5-0f087c5f6091 was re-scheduled: Anti-affinity instance group policy was violated.', 'code': 500, 'details': 'Traceback (most recent call last):\n File "/opt/openstack/venv/nova/lib/python2.7/site-packages/nova/compute/manager.py", line 1941, in _do_build_and_run_instance\n filter_properties, request_spec)\n File "/opt/openstack/venv/nova/lib/python2.7/site-packages/nova/compute/manager.py", line 2230, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\nRescheduledException: Build of instance 417e19c2-e2a5-48e0-8ce5-0f087c5f6091 was re-scheduled: Build of instance 417e19c2-e2a5-48e0-8ce5-0f087c5f6091 was re-scheduled: Anti-affinity instance group policy was violated.\n', 'created': '2020-02-21T03:43:18Z'}
```

This is with hard anti-affinity.
With soft-anti-affinity, there would be no reschedule forced, and therefore it would just never be effective.

Revision history for this message
melanie witt (melwitt) wrote :

Apologies for just now coming back to this -- it completely slipped my mind :(

The behavior Adam described is what is correct and expected in the "parallel requests for hard anti-affinity" scenario. The two will race and land on the same compute host initially. One of them will "win" and the other will fail what we call the "late affinity check" in nova-compute on the compute host and will be rescheduled and then fail if no other host is available.

Adam, do you recall what release version of nova you used when you did your test? Was it master/Ussuri or an older release?

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

The late affinity check that fixes the race is an upcall that simply not possible with the default cellv2 setup [1].

In a non resource constrained cloud there is another way to limit the possible race. You can configure [filter_scheduler]/host_subset_size [2] to positive number to avoid parallel scheduling requests selecting the same host as target.

[1] https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#operations-requiring-upcalls
[2] https://docs.openstack.org/nova/train/configuration/config.html#filter_scheduler.host_subset_size

Revision history for this message
melanie witt (melwitt) wrote :

The late affinity check works in a single-cell cellsv2 setup as [workarounds]/disable_group_policy_check_upcall defaults to False and will work as long as the nova-scheduler and nova-compute are on the same message queue. The single-cell cellsv2 setup is the most common deployment and is what devstack uses.

In a multi-cell cellsv2 setup though, it is true that the late affinity check is not possible regardless of the [workarounds]/disable_group_policy_check_upcall config option setting because nova-scheduler and nova-compute would not be connected to the same message queue.

So, I think we still need to investigate what is going on and verify whether/how a regression has occurred.

Revision history for this message
melanie witt (melwitt) wrote :
Download full text (6.5 KiB)

I finally got a chance to try and reproduce this on a devstack and can now see what you have reported.

$ git log -1
commit e20e731630c1b337daf4446286bb6c8e761025e3 (HEAD -> master, origin/master, origin/HEAD)
Merge: fc159ac91b 998475f5bd
Author: Zuul <email address hidden>
Date: Wed Mar 11 19:03:28 2020 +0000

    Merge "nova-net: Remove unused nova-network objects"

I created a server group with anti-affinity policy and booted two servers at the same time in separate terminal windows. (Note that this depends on your timing -- if you are "too slow" you will see one go to ERROR with "No valid host" and the other to ACTIVE because the affinity check at the scheduler will catch it).

$ openstack server group create --policy anti-affinity anti-affinity
$ openstack server group list
+--------------------------------------+---------------+---------------+
| ID | Name | Policies |
+--------------------------------------+---------------+---------------+
| 85f266b7-3fe9-492a-b80b-7c74b7ea1a73 | anti-affinity | anti-affinity |
+--------------------------------------+---------------+---------------+
$ openstack server create --image 50668455-013d-4daf-80b3-dc2ae225663f --flavor 42 --hint group=85f266b7-3fe9-492a-b80b-7c74b7ea1a73 --nic net-id=01524af1-e35d-4ed9-a411-1fee224fb07c one
$ openstack server create --image 50668455-013d-4daf-80b3-dc2ae225663f --flavor 42 --hint group=85f266b7-3fe9-492a-b80b-7c74b7ea1a73 --nic net-id=01524af1-e35d-4ed9-a411-1fee224fb07c two
$ openstack server list
+--------------------------------------+------+--------+------------------------+--------------------------+---------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------+--------+------------------------+--------------------------+---------+
| 8dd746d3-6201-4a3b-a3b1-71c854ff8721 | one | ACTIVE | shared=192.168.233.168 | cirros-0.4.0-x86_64-disk | m1.nano |
| 6fe23607-0875-498f-b52b-50c910bc1b61 | two | ACTIVE | shared=192.168.233.240 | cirros-0.4.0-x86_64-disk | m1.nano |
+--------------------------------------+------+--------+------------------------+--------------------------+---------+

BUT then I noticed in the /etc/nova/nova-cpu.conf:

[workarounds]
disable_group_policy_check_upcall = True

This will disable the late affinity check [which preserves affinity policy enforcement in the case of racing parallel requests) in nova-compute.

The default value is False [1] but it is set to True in devstack in the gate [2] because the gate is configured to exercise the multiple cell service topology [3] and run with a "superconductor". With multiple cells with each cell using their own separate message queue, the late affinity check can't work.

But in devstack, there is only a single message queue, so it is possible to use [workarounds]disable_group_policy_check_upcall = False in /etc/nova/nova-cpu.conf. You will want to set this if you want to be able to handle affinity races.

When using [workarounds]disable_group_policy_check_upcall = False with a multi-tier conductor setup, you'll also need to set t...

Read more...

Revision history for this message
melanie witt (melwitt) wrote :

I must correct parts of my earlier comment 7:

> With multiple cells with each cell using their own separate message queue, the late affinity check can't work.

> When running with multiple cells, it presently is not possible to enforce affinity policy properly in a race situation. To support this, affinity support needs to be implemented in the placement service.

This is incorrect. It is possible to enforce affinity policy in a race situation with multiple cells if cell conductors are configured to set [api_database]connection and computes do not set [workarounds]disable_group_policy_check_upcall.

This is not an ideal configuration with multiple cells, however, as cells are meant to be isolated from the upper layers of the deployment. That is where we will need affinity support to be added in the placement service, to be able to fully enforce affinity policies without need to configure cell conductors to access the API database.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/715092

Revision history for this message
melanie witt (melwitt) wrote :

I've proposed a doc update ^ related to this bug report.

Closing this as Invalid because server group affinity has not been regressed, as explained in comment 7 and comment 8.

Changed in nova:
status: New → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/715092
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=df216de6d9b195782be3cfc2d51296f3c4442b54
Submitter: Zuul
Branch: master

commit df216de6d9b195782be3cfc2d51296f3c4442b54
Author: melanie witt <email address hidden>
Date: Wed Mar 25 23:02:42 2020 +0000

    Add info about affinity requests to the troubleshooting doc

    We had recent bug report about a possible regression related to
    affinity policy enforcement with parallel server create requests.

    It turned out not to be a regression but because of the complexity
    around affinity enforcement, it might help to add a section to the
    compute troubleshooting doc about it which we could refer to in the
    future.

    Related-Bug: #1863190

    Change-Id: I508c48183a7205d46e13154d4e92d31dfa7f7d78

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.