Nodepool favouring precise nodes over f20

Bug #1308407 reported by Derek Higgins
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Nodepool
Fix Committed
Undecided
Unassigned
tripleo
Fix Released
Critical
Derek Higgins

Bug Description

during low demand times of the day nodepool is creating nodes in the ratio that was configured, but for over 1/2 of the day while demand on CI is high it seems to favour precise instance over f20 instances, this causes problems as jobs remain in the zuul queue until all of the precise jobs have finished and only then are f20 instances created, causing a delay in getting results (I've obseverd 10 hour delays in getting a f20 node)

This also causes jobs to be reported as "NOT_REGISTERED" when no f20 nodes are ACTIVE.

Revision history for this message
Derek Higgins (derekh) wrote :

I've reproduced this locally, it only happens when we are bumping up against max-servers (or presumably the quota).

There is code in nodepool to allocated nodes based on the ratio they were configured for but this doesn't happen correctly when there is demand for more then one server type and only one allocation is available.

In this scenario as nodes become available (1 at a time), and there is demand for more then one type, each new allocation is given to the first node type in the list (which is our case is the precise nodes), the only way we get a new f20 node created is if more then one allocation is freed at the same time, only then does nodepool move to the second node in the list.

I'm thinking a weighted randomiser to randomise the list but favouring requests based on their demand would be a more favourable algorithm.

This would replace the sort that is currently in the code
class AllocationProvider(object):
    def makeGrants(self):
        reqs.sort(lambda a, b: cmp(a.getPriority(), b.getPriority()))

Revision history for this message
Derek Higgins (derekh) wrote :

To reproduce
  o configure nodepool with two labels
  - name: nodepool-fake2
    image: nodepool-fake2
    min-ready: 4
    providers:
      - name: fake-provider
  - name: nodepool-fake
    image: nodepool-fake
    min-ready: 1
    providers:
      - name: fake-provider

 o start max-servers servers
 o place them all in a hold state
   nodepool -c tools/fake.yaml list | grep ready | awk '{print $2}' | xargs -n 1 nodepool -c tools/fake.yaml hold

 o remove them all one at a time, waiting long enough between each delete for a delete/allocation cycle to take place
   for x in $(nodepool -c tools/fake.yaml list | grep hold | awk '{print $2}') ; do nodepool -c tools/fake.yaml delete $x ; sleep 120 ; nodepool -c tools/fake.yaml list | grep ready | awk '{print $2}' | xargs -n 1 nodepool -c tools/fake.yaml hold ; sleep 5 ; done

You should end up with only a single type of server being allocated

Revision history for this message
Derek Higgins (derekh) wrote :
Revision history for this message
Derek Higgins (derekh) wrote :

There is also an alternate solution now proposed
     https://review.openstack.org/#/c/101110/

Revision history for this message
Derek Higgins (derekh) wrote :

The alternate solution has merged https://review.openstack.org/#/c/101110/

Changed in tripleo:
status: Triaged → Fix Released
Changed in nodepool:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.