Instance with anti-affinity server group booted failed in concurrent scenario

Bug #1647584 reported by Tao Li
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Opinion
Wishlist
Tao Li

Bug Description

Description
===========
the follows assumption scenario。
1. The compute resources are enough.
2. Booting instances in concurrent scenario.
3. The number of instance is less than the compute nodes'。
4. The instances booting with the same anti-affinity group.
5. more than one controller node。

In the above scenario, the number of instances booting failed are more than expected。In concurrent scenario, one more instances will be scheduled to the same compute nodes even specifying anti-affinity, so after 'instance_claim', compute will check the anti-affinity without lock, perhaps two or more instances will be checked together and failed because of affecting each other。so these instances will be rescheduled. In the next scheduling, the previous compute node will be ignored.

Steps to reproduce
==================
1. Assumpt 3 compute nodes and 2 or more controller nodes.
2. Create a anti-affinity server group.
3. Construct a bash script for booting instances with anti-affinity group in concurrent scenario.
   nova boot --flavor 1 --image cirros --nic net-id=b0406792-26a8-4f26-843e-3b2231dbd4da --hint group=eaa8694e-8c83-47f2-8c02-93657c08d2bd lt_test01 &
nova boot --flavor 1 --image cirros --nic net-id=b0406792-26a8-4f26-843e-3b2231dbd4da --hint group=eaa8694e-8c83-47f2-8c02-93657c08d2bd lt_test02 &
nova boot --flavor 1 --image cirros --nic net-id=b0406792-26a8-4f26-843e-3b2231dbd4da --hint group=eaa8694e-8c83-47f2-8c02-93657c08d2bd lt_test03 &
nova boot --flavor 1 --image cirros --nic net-id=b0406792-26a8-4f26-843e-3b2231dbd4da --hint group=eaa8694e-8c83-47f2-8c02-93657c08d2bd lt_test04
4. execute the bash script.

Expected result
===============
3 instances were booting successfully.

Actual result
=============
2 instances were booting succssfully.

Tao Li (eric-litao)
Changed in nova:
assignee: nobody → Tao Li (eric-litao)
Revision history for this message
Sean Dague (sdague) wrote :

There are not strong guaruntees about scheduling, especially when you are hitting failure domains. So while this is not ideal, I think this is more of a wishlist item to have this function when you are at a full packed scenario (more guests than computes trying policy based placement).

Marking Opinion / Wishlist. It is fine to push fixes for this bug if you have one.

Changed in nova:
status: New → Opinion
importance: Undecided → Wishlist
Tao Li (eric-litao)
description: updated
description: updated
Revision history for this message
Tao Li (eric-litao) wrote :

I have a fix about this bug, commit it later

norman shen (jshen28)
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.