Anti-Affinity causes instance provisioning to take a lot of time

Bug #1993781 reported by DeadRabbit
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Hey there,

we are facing massive issues regarding nova scheduling and octavia provisioning when talking about anti-affinity.

It seems, that it is problematic that nova is scheduling instances in antiaffinity groups sometimes on the same host and reschedules them after reporting the host to the queue.

It is at least even a problem in regards of normal instancescheduling, as it takes a lot more time to schedule a vm two or three times.

Is there any solution for optimizing scheduling? Can nova be configured to check before second, third, whatever instance is created?

I would appreciate an optimization in here as it is causing some trouble.

Greetings

Christian

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Could you provide a bit more details?

> It seems, that it is problematic that nova is scheduling instances in antiaffinity
> groups sometimes on the same host and reschedules them after reporting the host to
> the queue.

Do you have compute logs for the case when the re-schedule happens?
How do you provision the instances that expected to be in the same anti-affinity group?

If you deploy instances in parallel then two VM scheduling can happen in parallel and the scheduler might choose the same host for both instances as none of them are on that host at that point in time. Nova has a late affinity check in the compute that detects these parallel scheduling and rejects the boot if it would go against the anti-affinity policy. If this is the cause in your case then one thing you can do is to make sure you don't start the instances in parallel but but some delay between the instance create requests.

Changed in nova:
status: New → Incomplete
Revision history for this message
DeadRabbit (msstinkt) wrote :

Hey Balazs,

I need to reconfigure octavia to provide logs, as we are facing that issue mainly in regards of amphorascheduling.

I am quite sure, that they are provisioned at the same time and there is no chance for us to change that behaviour so far.

Nevertheless, it shouldn't be a "You are using it wrong" case, that deployment of anti affinity in regards of parallel scheduling is not working.

I try to get the logs today and update that case with the information.

Greetings,
Christian

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.