OpenStack provider: retry-provisioning doesn't work for `Quota exceeded for ...`

Bug #1938736 reported by Haw Loeung
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Ian Booth

Bug Description

Hi,

My model is running Juju 2.8.9 and my client 2.9.9. On deployment using a Juju bundle, I ran into an issue with running out of quotas. I quickly corrected that but it seems there's no way to retry provisioning of this failed machine:

| 10 down pending focal cannot run instance: Unauthorised URL https://nova.ps5...:8774/v2.1/servers
caused by: request (https://nova.ps5...:8774/v2.1/servers) returned unexpected status: 403; error info: {"forbidden": {"code": 403, "message": "Quota exceeded for cores, instances: Requested 2, 1, but already used 20, 10 of 20, 10 cores, instances"}}

A `juju retry-provisioning 10` doesn't work. As discussed with Ian, this is due to the error code being a 403 indicating permissions/credentials rather than the expected 409 or 413.

Any chance we could allow retrying provisioning of machines in this state? Maybe allow retry-provisioning for all 4XX error codes or with a `--force` option to `retry-provisioning`?

Revision history for this message
Ian Booth (wallyworld) wrote (last edit ):

Juju will retry provisioning machines it considers have transient provisioning errors. I looked into this in more detail, and the interpretation of the error code is the means by which juju will automatically retry provisioning. However, the retry-provisioning command is a way to signal that provisioning should be retried as per a user request to do so, but only if the machine status is "error" or "provisioning error".

juju show-machine does seem to indicate the machine is in error

$ juju show-machine 16
machines:
  "10":
    juju-status:
      current: down
      message: agent is not communicating with the server
      since: 03 Aug 2021 22:37:33Z
    instance-id: pending
    machine-status:
      current: provisioning error
      message: |-
        cannot run instance: Unauthorised URL https://xxxx:8774/v2.1/servers
        caused by: request (https://xxxx:8774/v2.1/servers) returned unexpected status: 403; error info: {"forbidden": {"code": 403, "message": "Quota exceeded for cores: Requested 2, but already used 30 of 31 cores"}}
      since: 03 Aug 2021 22:37:33Z
    modification-status:
      current: idle
      since: 03 Aug 2021 22:34:33Z
    series: bionic
    constraints: root-disk-source=volume

I am surprised though that the 403 is not putting the model into suspended state as 403 should be interpreted as an invalid credential. That's not what we want here but it's what I would have expected to see. Ideally there would be a different http code used for quota exceeded.

Changed in juju:
milestone: none → 2.9.11
importance: Undecided → High
status: New → Triaged
Revision history for this message
Ian Booth (wallyworld) wrote :

This issue appears to be that the retry-provisioning command is recording "transient=true" in the status record for the machine, but the transient machine query is checking the record for the machine instance. So the provisioner never sees any transient machine errors and so never gets to retry.

Changed in juju:
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → In Progress
Revision history for this message
Ian Booth (wallyworld) wrote :
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.