Hi Adam -- just an FYI that we've made some progress on this: I've been able to repro locally using your bash script with "parallel" (though I have 8 cores so it bootstrapped 8 in parallel). I saw a bunch of RequestLimitExceeded logs, which seems to be the same issue you are seeing.
We suspect this is happening because there are two AWS libraries being used in Juju: the legacy one for most EC2 calls, but the new AWS SDK for bootstrapping. A lot of AWS API calls are made during bootstrapping (and multiplied by the # of parallel bootstraps) to determine instance types / costs -- on the order of 300 requests in a few seconds when bootstrapping 8 controllers.
Ian added request retries to the legacy library. The AWS SDK has retries, but I believe they're not turned on by default, so all the "instance type" requests that happen on startup don't have retrying / exponential backoff enabled.
I'm going to spend some more time on this tomorrow to confirm the above and (hopefully) get in a fix that enables retrying for calls made from the AWS SDK library too.
Hi Adam -- just an FYI that we've made some progress on this: I've been able to repro locally using your bash script with "parallel" (though I have 8 cores so it bootstrapped 8 in parallel). I saw a bunch of RequestLimitExc eeded logs, which seems to be the same issue you are seeing.
We suspect this is happening because there are two AWS libraries being used in Juju: the legacy one for most EC2 calls, but the new AWS SDK for bootstrapping. A lot of AWS API calls are made during bootstrapping (and multiplied by the # of parallel bootstraps) to determine instance types / costs -- on the order of 300 requests in a few seconds when bootstrapping 8 controllers.
Ian added request retries to the legacy library. The AWS SDK has retries, but I believe they're not turned on by default, so all the "instance type" requests that happen on startup don't have retrying / exponential backoff enabled.
I'm going to spend some more time on this tomorrow to confirm the above and (hopefully) get in a fix that enables retrying for calls made from the AWS SDK library too.