Intermittent failure in ProvisionerSuite.TestProvisioningMachinesFailMachine
Bug #1893848 reported by
Ben Hoyt
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned |
Bug Description
Log excerpt below (full failure log attached):
FAIL: provisioner_
[LOG] 0:00.786 ERROR test cannot start instance for machine "2": fail provisioning for TestAvailabilit
[LOG] 0:00.800 DEBUG juju.apiserver -> [2] machine-0 12.895843ms {"request-
provisioner_
assertAvail
provisioner_
c.Assert(
... obtained int = 2
... expected int = 2
OOPS: 59 passed, 1 skipped, 1 FAILED
Changed in juju: | |
milestone: | 2.9-beta1 → 2.9-rc1 |
Changed in juju: | |
milestone: | 2.9-rc1 → none |
To post a comment you must log in.
I'm giving up on this for now (time boxed investigation to a day). Just leaving a few notes here for when this is picked back up:
* The error is in the assertAvailabil ityZoneMachines Distribution( ) function, because the three "good" machines (1, 3, and 4) are started on zones 1, 1, and 3, respectively ... and that assertion function ensures there's not a delta of 2 between the "heaviest" zone (zone 1 with two machines) and the "lightest" (zones 2 and 4 with no machines). provisioner/ provisioner_ task.go there's a function machineAvailabi lityZoneDistrib ution() that distributes the machines across the zones. I suspect what's happening is that in the case of the machine 4 failure, something is grabbing zone 2 in a racy way so that machine 3 is starting up on zone 1 (and doubling up with machine 1).
* I noticed in the log that there's a message "got not provisioned error while waiting: machine 4 not provisioned" ... so for whatever reason there was a failure starting up machine 4. So that's probably what's causing it.
* In worker/