Devstack gate launch jobs timeout confusing Jenkins and devstack gate

Bug #1204625 reported by Clark Boylan
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Core Infrastructure
Fix Released
High
Clark Boylan

Bug Description

If the devstack gate vm launch jobs timeout it is possible for the devstack gate DB and Jenkins to get out of sync confusing them. These has led to more than one job being run on the same host and hosts being deleted out from under running jobs.

Relavent logs can be found at http://paste.openstack.org/show/41610/

It has been suggested that the devstack gate pool manager should be a daemon so that it can properly track state without needing to handle it across distinct processes (Jenkins jobs). I have also lowered the ready nodes numbers per d-g AZ to 15 to reduce the average number of slaves that must be spun up by d-g.

James E. Blair (corvus)
Changed in openstack-ci:
assignee: nobody → James E. Blair (corvus)
assignee: James E. Blair (corvus) → nobody
Revision history for this message
Clark Boylan (cboylan) wrote :

The issue here was that after a timeout there may have been nodes added to Jenkins that were still marked BUILDING in the d-g database. When d-g attempted to add the BUILDING nodes into Jenkins any nodes already added to Jenkins would error and be deleted. By the time this happens jobs may have started running on that node resulting in all kinds of bad test failures.

This was corrected in https://review.openstack.org/#/c/38674/ and the fix was to check the error returned when attempting to add a node to Jenkins and if the error was that the node already existed to ignore the error and continue processing that host. Eventually this would mark the node as READY which is the correct state.

Changed in openstack-ci:
assignee: nobody → Clark Boylan (cboylan)
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.