Intermittent failure in ProvisionerSuite.TestProvisioningMachinesFailMachine

Bug #1893848 reported by Ben Hoyt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

Log excerpt below (full failure log attached):

FAIL: provisioner_test.go:1731: ProvisionerSuite.TestProvisioningMachinesFailMachine

[LOG] 0:00.786 ERROR test cannot start instance for machine "2": fail provisioning for TestAvailabilityZoneMachinesFailMachine

[LOG] 0:00.800 DEBUG juju.apiserver -> [2] machine-0 12.895843ms {"request-id":47,"response":"'body redacted'"} Provisioner[""].SetInstanceStatus
provisioner_test.go:1753:
    assertAvailabilityZoneMachinesDistribution(c, availabilityZoneMachines)
provisioner_test.go:1555:
    c.Assert(max-min, jc.LessThan, 2)
... obtained int = 2
... expected int = 2

OOPS: 59 passed, 1 skipped, 1 FAILED

Revision history for this message
Ben Hoyt (benhoyt) wrote :
Changed in juju:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Ben Hoyt (benhoyt) wrote :

I'm giving up on this for now (time boxed investigation to a day). Just leaving a few notes here for when this is picked back up:

* The error is in the assertAvailabilityZoneMachinesDistribution() function, because the three "good" machines (1, 3, and 4) are started on zones 1, 1, and 3, respectively ... and that assertion function ensures there's not a delta of 2 between the "heaviest" zone (zone 1 with two machines) and the "lightest" (zones 2 and 4 with no machines).
* I noticed in the log that there's a message "got not provisioned error while waiting: machine 4 not provisioned" ... so for whatever reason there was a failure starting up machine 4. So that's probably what's causing it.
* In worker/provisioner/provisioner_task.go there's a function machineAvailabilityZoneDistribution() that distributes the machines across the zones. I suspect what's happening is that in the case of the machine 4 failure, something is grabbing zone 2 in a racy way so that machine 3 is starting up on zone 1 (and doubling up with machine 1).

Changed in juju:
assignee: Ben Hoyt (benhoyt) → nobody
Changed in juju:
milestone: 2.9-beta1 → 2.9-rc1
Pen Gale (pengale)
Changed in juju:
milestone: 2.9-rc1 → none
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: Medium → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.