3 instances launched in soft anti-affinity server group but unexpectedly ignored the 3rd host

Bug #1834255 reported by Wendy Mitchell
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Gerry Kopec

Bug Description

Brief Description
-----------------
anti-affinity server group selected on instantiation but some instances landed on the same host (unexpectedly ignoring the 3rd host that was available)

Severity
--------
standard

Steps to Reproduce
------------------
1. created a flavor
Flavor ID c5bf85da-dfd1-4be8-9cbb-94904ffdb948
RAM 1GB
VCPUs 1 VCPU
Disk 2GB
hw:mem_page_size 2048

nova flavor-list
c5bf85da-dfd1-4be8-9cbb-94904ffdb948 | srv_grp | 1024 | 2 | 0 | | 1 | 1.0 | True | -

2. as tenant1 user, created a server group with soft anti-affinity setting
grp_soft_anti_affinity-1 042740e8-364c-40e0-94ad-5bfb07dd61bd Soft Anti Affinity

3. Ensuring that all hypervisors are enabled and there is room on each of the 3 compute hosts

as tenant1 user, launched an instance in the server group from step 2 with the flavor from step 1
as tenant1 user, launched another instance in the same server group with the same flavor
as tenant1 user, launched a 3rd instance in the same server group with the same flavor

Expected Behaviour
------------------
Expect each of the 3 instances should have landed on their own host as all 3 hosts were available

Actual Behaviour
------------------

2 of the 3 instances scheduled on compute-1, 1 scheduled on compute-0

----------------
$ openstack server list --all
+--------------------------------------+-------+--------+-----------------------------------------------------------+-------+---------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-------+--------+-----------------------------------------------------------+-------+---------+
| 2cd165c7-cdba-4d44-9853-4040ca113b0c | three | ACTIVE | tenant1-mgmt-net=192.168.85.57; tenant1-net1=172.16.1.161 | | srv_grp |
| a444bd09-6bb7-4e21-a4c3-71c5085b9d10 | two | ACTIVE | tenant1-mgmt-net=192.168.85.46; tenant1-net0=172.16.0.153 | | srv_grp |
| f5da1493-b136-48cc-80ce-746141648e36 | one | ACTIVE | tenant1-mgmt-net=192.168.85.51; tenant1-net1=172.16.1.201 | | srv_grp |
+--------------------------------------+-------+--------+-----------------------------------------------------------+-------+---------

$ openstack server show 2cd165c7-cdba-4d44-9853-4040ca113b0c
+-------------------------------------+-----------------------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------------------+
| OS-DCF:diskConfig | AUTO |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | compute-1 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1 |
| OS-EXT-SRV-ATTR:instance_name | instance-00000387 |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2019-06-25T20:56:56.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | tenant1-mgmt-net=192.168.85.57; tenant1-net1=172.16.1.161 |
| config_drive | |
| created | 2019-06-25T20:56:34Z |
| flavor | srv_grp (c5bf85da-dfd1-4be8-9cbb-94904ffdb948) |
| hostId | 9a30d3bab229e5b0ec63017cbcf3690b110db0f8d73b9a30206ffb9f |
| id | 2cd165c7-cdba-4d44-9853-4040ca113b0c |
| image | |
| key_name | None |
| name | three |
| progress | 0 |
| project_id | db4395f7baaa4de28a5417c659b28acd |
| properties | |
| security_groups | name='default' |
| | name='default' |
| status | ACTIVE |
| updated | 2019-06-25T20:56:57Z |
| user_id | af0923a7f8a641d1902e3552be9f74f2 |
| volumes_attached | id='b99ab998-019c-46f1-9930-39470d357227' |
+-------------------------------------+-----------------------------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$ openstack server show a444bd09-6bb7-4e21-a4c3-71c5085b9d10
+-------------------------------------+-----------------------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------------------+
| OS-DCF:diskConfig | AUTO |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | compute-0 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-0 |
| OS-EXT-SRV-ATTR:instance_name | instance-00000384 |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2019-06-25T20:55:34.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | tenant1-mgmt-net=192.168.85.46; tenant1-net0=172.16.0.153 |
| config_drive | |
| created | 2019-06-25T20:55:24Z |
| flavor | srv_grp (c5bf85da-dfd1-4be8-9cbb-94904ffdb948) |
| hostId | 4010538971e4a045426712945a6a179d7ab2f7d52e1546abf0d8d308 |
| id | a444bd09-6bb7-4e21-a4c3-71c5085b9d10 |
| image | |
| key_name | None |
| name | two |
| progress | 0 |
| project_id | db4395f7baaa4de28a5417c659b28acd |
| properties | |
| security_groups | name='default' |
| | name='default' |
| status | ACTIVE |
| updated | 2019-06-25T20:55:34Z |
| user_id | af0923a7f8a641d1902e3552be9f74f2 |
| volumes_attached | id='21adce61-36b2-4f3e-9d35-d2b7fbddd6f5' |
+-------------------------------------+-----------------------------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$ openstack server show f5da1493-b136-48cc-80ce-746141648e36
+-------------------------------------+-----------------------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------------------+
| OS-DCF:diskConfig | AUTO |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | compute-1 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1 |
| OS-EXT-SRV-ATTR:instance_name | instance-00000381 |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2019-06-25T20:54:33.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | tenant1-mgmt-net=192.168.85.51; tenant1-net1=172.16.1.201 |
| config_drive | |
| created | 2019-06-25T20:54:19Z |
| flavor | srv_grp (c5bf85da-dfd1-4be8-9cbb-94904ffdb948) |
| hostId | 9a30d3bab229e5b0ec63017cbcf3690b110db0f8d73b9a30206ffb9f |
| id | f5da1493-b136-48cc-80ce-746141648e36 |
| image | |
| key_name | None |
| name | one |
| progress | 0 |
| project_id | db4395f7baaa4de28a5417c659b28acd |
| properties | |
| security_groups | name='default' |
| | name='default' |
| status | ACTIVE |
| updated | 2019-06-25T20:54:33Z |
| user_id | af0923a7f8a641d1902e3552be9f74f2 |
| volumes_attached | id='1b501b68-6016-4f8d-8846-23b6698cfab3' |

Reproducibility
---------------
100%

System Configuration
--------------------
standard system

Branch/Pull Time/Commit
-----------------------
BUILD_ID="20190612T013000Z"

Timestamp/Logs
--------------
see inline

Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :

Lab: WP_3-7 (2 controller, 3 computes)

Numan Waheed (nwaheed)
tags: added: stx.regression stx.retestneeded
Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :

Marking as regression affecting, The following testcases fail due to this issue:
FAIL 20190624 02:48:22 test_server_group_boot_vms[soft_anti_affinity-3]
FAIL 20190624 02:51:53 test_server_group_boot_vms[anti_affinity-2]
FAIL 20190624 02:55:43 test_server_group_boot_vms[soft_affinity-3]

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.2.0 gating; it looks like a nova helm override is missing for this

tags: added: stx.2.0 stx.containers
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Gerry Kopec (gerry-kopec)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/674177

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/674177
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=314890983ddc00c1ade39822a20eae935bab2e7b
Submitter: Zuul
Branch: master

commit 314890983ddc00c1ade39822a20eae935bab2e7b
Author: Gerry Kopec <email address hidden>
Date: Thu Aug 1 00:13:04 2019 -0400

    Update nova overrides to enable affinity weigher

    Affinity weigher is required to support soft-anti-affinity and
    soft-affinity server group policies in nova. Set to a relatively high
    mulitplier of 20 to ensure that this criteria predominates the host
    selection.

    Adjust other weigher multipliers accordingly:
    io_ops: remove override to let it use default value of -1. Old -5
            setting was related to discontinued stx-nova patch in previous
            stx release.
    cpu & build_failure: disable similar to ram, disk & pci.

    Also enable shuffle_best_same_weighed_hosts to randomize host selection
    where weights are equal across multiple hosts.

    Change-Id: I28f92a7c703d1b78d5cab93418359ce164e61066
    Closes-Bug: 1834255
    Signed-off-by: Gerry Kopec <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :

2019-08-07_20-59-00
verified when instances launched in the server group, the soft anti-affinity policy appears to be respected ie. the instances landed on each of the 8 worker nodes

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.