providers obscure provisioning errors to conform to AZ distribution

Bug #1732564 reported by Andrew Wilkins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Andrew Wilkins

Bug Description

The provisioner relies on a special error, environs.ErrAvailabilityZoneFailed, being returned from StartInstance implementations to decide whether or not to attempt another zone. i.e. anything that's not ErrAvailabilityZoneFailed is expected to be a zone-independent failure, and is terminal.

This poses a couple of problems:
 - it's too easy to accidentally terminate provisioning by passing up some other type of error
 - the cause of errors is obscured (we log them, but that puts the onus on the implementations)

At the same time, we have no way to indicate to the provisioner that an error is truly terminal; that it will never work in the current zone, or perhaps any other.

We should invert things, so that errors are assumed not to be terminal by default, and assumed to be zone-specific by default. We should then introduce error *interfaces* that the provisioner (and provider/common bootstrap) code can use to determine whether the error is terminal and/or zone-independent:

  type PermanentError interface {
      error

      // Permanent reports whether or not the error is permanent.
      // Permanent errors will not be retried by the provisioner.
      Permanent() bool
  }

  type AvailabilityZoneError interface {
      error

      // AvailabilityZoneIndependent reports whether or not the
      // error is related to a specific availability zone.
      AvailabilityZoneIndependent() bool
  }

If an error returned by StartInstance implements PermanentError and its Permanent method returns true, then it will not be retried for the current availability zone; otherwise it will be retried.

If an error returned by StartInstance implements AvailabilityZoneError and its AvailabilityZoneIndependent returns true, then it will not be retried in another availability zone; otherwise it will be.

Andrew Wilkins (axwalk)
summary: - providers obscure provisioning errors to confirm to AZ distribution
+ providers obscure provisioning errors to conform to AZ distribution
Revision history for this message
Andrew Wilkins (axwalk) wrote :

For a concrete example of why this is a problem, see https://bugs.launchpad.net/juju/+bug/1732764.

Andrew Wilkins (axwalk)
Changed in juju:
milestone: 2.4-beta1 → 2.3-rc2
assignee: nobody → Andrew Wilkins (axwalk)
Andrew Wilkins (axwalk)
Changed in juju:
status: Triaged → In Progress
Revision history for this message
Andrew Wilkins (axwalk) wrote :
Andrew Wilkins (axwalk)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.