soft-affinity weight not normalized base on server group's maximum

Bug #1870096 reported by Johannes Kulik
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Johannes Kulik
Pike
New
Undecided
Unassigned
Queens
New
Undecided
Unassigned
Rocky
New
Undecided
Unassigned
Stein
New
Undecided
Unassigned
Train
New
Undecided
Unassigned

Bug Description

Description
===========

When using soft-affinity to schedule instances on the same host, the weight is unexpectedly low if a server was previously scheduled to any server-group with more members on a host. This low weight can then be easily outweighed by differences in resources (e.g. RAM/CPU).

Steps to reproduce
==================

Do not restart nova-scheduler in the process or the bug doesn't appear. You need to change the ServerGroupSoftAffinityWeigher to actually log the weights it computes to see the problem.

* Create a server-group with soft-affinity (let's call it A)
* Create 6 servers in server-group A, one after the other so they end up on the same host.
* Create another server-group with soft-affinity (B)
* Create 1 server in server-group B
* Create 1 server in server-group B and look at the scheduler's weights assigned to the hosts by the ServerGroupSoftAffinityWeigher.

Expected result
===============

The weight assigned to the host by the ServerGroupSoftAffinityWeigher should be 1, as the maximum number of instances for server-group B is on that host (the one we created there before).

Actual result
=============
The weight assigned to the host by the ServerGroupSoftAffinityWeigher is 0.2, as the maximum number of instances ever encountered on a host is 5.

Environment
===========

We noticed this on a queens version of nova a year ago. Can't give the exact commit anymore, but the code still looks broken in current master.

I've opened a review-request for fixing this bug here: https://review.opendev.org/#/c/713863/

Tags: scheduler
Changed in nova:
assignee: nobody → Johannes Kulik (jkulik)
status: New → In Progress
Johannes Kulik (jkulik)
description: updated
Changed in nova:
importance: Undecided → Medium
tags: added: scheduler
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/713863
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5ab9ef11e27014ce8b43e1bac76903fed70d0fbf
Submitter: Zuul
Branch: master

commit 5ab9ef11e27014ce8b43e1bac76903fed70d0fbf
Author: Johannes Kulik <email address hidden>
Date: Thu Mar 19 12:51:25 2020 +0100

    Don't recompute weighers' minval/maxval attributes

    Changing the minval/maxval attribute to the minimum/maxium of every
    weigher run changes the outcome of future runs. We noticed it in the
    SoftAffinityWeigher, where a previous run with a host hosting a lot of
    instances for a server-group would make a later run use that maximum.
    This resulted in the weight being lower than 1 for a host hosting all
    instances of another server-group, if the number of instances of that
    server-group on that host is less than a previous server-group's
    instances on any host.

    Previously, there were two places that computed the maxval/minval - once
    in normalize() and once in weigh_objects() - but only the one in
    weigh_objects() saved the values to the weigher.

    The code now uses the maxval/minval as defined by the weigher and keeps
    the weights inside the maxval-minval range. There's also only one place
    to compute the minval/maxval now, if the weigher did not set a value:
    normalize().

    Closes-Bug: 1870096

    Change-Id: I60a90dabcd21b4e049e218c7c55fa075bb7ff933

Changed in nova:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.