Juju does not handle 429 "too many requests" from Azure

Bug #1540394 reported by Aaron Bentley
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Andrew Wilkins

Bug Description

As seen here:
http://reports.vapour.ws/releases/issue/571a75d7749a5618d867765e
and
http://reports.vapour.ws/releases/issue/56af6c8a749a561ea206843b

Juju should not fail when Azure tells it "too many requests" -- it should slow down.

This issue is critical be it consistently fails revision tests. It is contributing to resource exhaustion because Juju cannot clean up -- contributing to other failures.

tags: added: 2.0-count
Revision history for this message
Aaron Bentley (abentley) wrote :

We are now seeing this frequently, so I have upped the priority.

Changed in juju-core:
importance: Medium → High
milestone: none → 2.0-beta7
description: updated
Revision history for this message
Andrew Wilkins (axwalk) wrote :

I think we should be able to modify the response handlers to back off when we receive 429. This will involve modifying the clients created in "azureEnviron.SetConfig".

Andrew Wilkins (axwalk)
Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Andrew Wilkins (axwalk)
Revision history for this message
Andrew Wilkins (axwalk) wrote :

I have to pause on this for now, so unassigning myself in case someone else wants to pick it up.

I think we want to decorate the autorest.Client's Sender such that it retries (for a specified duration) with backoff when a 429 occurs. We may want or need to update azure-sdk-for-go first, as it appears that the API for autorest has changed a bit.

Changed in juju-core:
status: In Progress → Triaged
assignee: Andrew Wilkins (axwalk) → nobody
Curtis Hovey (sinzui)
tags: added: blocker
Changed in juju-core:
importance: High → Critical
description: updated
Andrew Wilkins (axwalk)
Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Andrew Wilkins (axwalk)
Revision history for this message
Andrew Wilkins (axwalk) wrote :

The Azure SDK for Go has changed quite a bit, I'll need to rejigger some things on our end. Also there may be a problem caused by them using vendoring.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Vendor issue was my fault, disregard.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

I'm going to bump as far as 3b480eaaf6b4236d43a3c06cba969da6f53c8b66, which I've already done the work for (https://github.com/juju/juju/pull/3807). Moving to master is too much of a pain right now. I've filed an Azure SDK for Go issue to ask what's up: https://github.com/Azure/azure-sdk-for-go/issues/326.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

So I was previously thinking we'd inject in to the autorest senders, but I don't think that's going to work in general. PUT/POST will send a body, and the autorest stuff doesn't use rewindable buffers.

So I think I'll just wrap all API calls with juju/retry.Call.

Ian Booth (wallyworld)
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta7 → none
milestone: none → 2.0-beta7
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.