Nodepool

Nodepool favouring precise nodes over f20

Bug #1308407 reported by Derek Higgins on 2014-04-16

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Nodepool	Fix Committed	Undecided	Unassigned
	tripleo	Fix Released	Critical	Derek Higgins

Bug Description

during low demand times of the day nodepool is creating nodes in the ratio that was configured, but for over 1/2 of the day while demand on CI is high it seems to favour precise instance over f20 instances, this causes problems as jobs remain in the zuul queue until all of the precise jobs have finished and only then are f20 instances created, causing a delay in getting results (I've obseverd 10 hour delays in getting a f20 node)

This also causes jobs to be reported as "NOT_REGISTERED" when no f20 nodes are ACTIVE.

Revision history for this message

Derek Higgins (derekh) wrote on 2014-04-16:

I've reproduced this locally, it only happens when we are bumping up against max-servers (or presumably the quota).

There is code in nodepool to allocated nodes based on the ratio they were configured for but this doesn't happen correctly when there is demand for more then one server type and only one allocation is available.

In this scenario as nodes become available (1 at a time), and there is demand for more then one type, each new allocation is given to the first node type in the list (which is our case is the precise nodes), the only way we get a new f20 node created is if more then one allocation is freed at the same time, only then does nodepool move to the second node in the list.

I'm thinking a weighted randomiser to randomise the list but favouring requests based on their demand would be a more favourable algorithm.

This would replace the sort that is currently in the code
class AllocationProvider(object):
def makeGrants(self):
reqs.sort(lambda a, b: cmp(a.getPriority(), b.getPriority()))

Revision history for this message

Derek Higgins (derekh) wrote on 2014-04-17:

To reproduce
  o configure nodepool with two labels
  - name: nodepool-fake2
    image: nodepool-fake2
    min-ready: 4
    providers:
      - name: fake-provider
  - name: nodepool-fake
    image: nodepool-fake
    min-ready: 1
    providers:
      - name: fake-provider

o start max-servers servers
o place them all in a hold state
nodepool -c tools/fake.yaml list | grep ready | awk '{print $2}' | xargs -n 1 nodepool -c tools/fake.yaml hold

o remove them all one at a time, waiting long enough between each delete for a delete/allocation cycle to take place
for x in $(nodepool -c tools/fake.yaml list | grep hold | awk '{print $2}') ; do nodepool -c tools/fake.yaml delete $x ; sleep 120 ; nodepool -c tools/fake.yaml list | grep ready | awk '{print $2}' | xargs -n 1 nodepool -c tools/fake.yaml hold ; sleep 5 ; done

You should end up with only a single type of server being allocated

Revision history for this message

Derek Higgins (derekh) wrote on 2014-04-17:

https://review.openstack.org/#/c/88223/