Anti-affinity instance creation failed and the scheduling node was incorrect.

Bug #1886160 reported by wang
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Description
===========
Anti-affinity instance creation failed.My environment has five compute node, I created a server group, create instances, five the affinity of the fifth instance failure, log prompt "Exceeded Max scheduling attempts for 3 instan". In the first four instances on the calculation of 1-4, the fifth instance is created to compute1, compute2, compute3 try to create, due to the affinity strategy cannot create, prompted to create failure after trying three times, (the scheduler Max_attempts default 3),(max_server_per_host default 1).

compute1, compute2, compute3 all have error log, and compute4 ,compute5 have no error log.

Environment
===========

1.One control node, five compute nodes
2.Nova version:Queen

Error Logs
==========
'message': 'Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instan: Build of instance 019e9891-72c9-4a15-a8f4-3b6ccc3c4535 was re-scheduled: Anti-affinity instance group policy was viol', 'code': 500, "/var/lib/openstack/lib/python2.7/site-packages/nova/conductor/manager.py", line 610, in build_instances\n filter_properties, inst7/site-packages/nova/scheduler/utils.py", line 680, in populate_retry\n raise exception.MaxRetriesExceeded(reason=msg)\nMaxRetriesEax scheduling attempts 3 for instance 019e9891-72c9-4a15-a8f4-3b6ccc3c4535. Last exception: Build of instance 019e9891-72c9-4a15-a8f4-roup policy was violated.\n', 'created': '2020-07-02T06:21:20Z'}

Tags: scheduler
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

@Wang: do you have scheduler debug logs for the instance creation attempts?
Do you have the ServerGroupAntiAffinityFilter added to [filter_scheduler]enabled_filters config option?

I set this bug to Incomplete. Please set it back to new when you provided the above information.

Changed in nova:
status: New → Incomplete
tags: added: scheduler
Revision history for this message
wang (jiajing) wrote :

Hello, balazs. [filter_scheduler] configuration of ServerGroupAntiAffinityFilter, on the pre code, I found that the problem is when reScheduler stochastic scheduling host is compute1 2, 3, and 2 3 compute1 have created instance, so after three times in the scheduling errors.

[filter_scheduler]
enabled_filters = DifferentHostFilter,SameHostFilter,RetryFilter,AvailabilityZoneFilter,RamFilter,CoreFilter,DiskFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,PciPassthroughFilter,NUMATopologyFilter,AggregateInstanceExtraSpecsFilter,JsonFilter

Changed in nova:
status: Incomplete → New
Revision history for this message
Artom Lifshitz (notartom) wrote :

Are you creating your instances individually or as a group (ie with min_count and/or max_count?)

Nova does what we call a "late anti-affinity check" on the compute host itself before booting the instance. This check is known to be super racy and problematic, and unfortunately there's no easy solution. I also notice that you have the NUMAToplogyFilter enabled - this problematic check is made worse by using instances with NUMA characteristics.

The workaround in cases like that is to create instances one at a time.

So unfortunately I don't have much good news here, as this will mostly likely get closed as WONTFIX, but in the meantime, I'll set it to Incomplete and wait for your answer to my initial question, and to report back if the proposed workaround was helpful. Please set the bug back to NEW when you've replied.

Thanks!

Changed in nova:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.