GCE credential 403 error spams logs

Bug #1829388 reported by Ian Booth on 2019-05-16
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
High
Tim McNamara
2.6
High
Tim McNamara

Bug Description

A GCE controller has logs with many of these messages:

2019-05-15 14:33:04 ERROR juju.worker.dependency engine.go:636 "compute-provisioner" manifold worker returned unexpected error: googleapi: Error 403: Access Not Configured. Compute Engine API has not been used in project 535381582111 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/compute.googleapis.com/overview?project=535381582111 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry., accessNotConfigured
2019-05-15 14:33:04 DEBUG juju.worker.dependency engine.go:647 stack trace:
googleapi: Error 403: Access Not Configured. Compute Engine API has not been used in project 535381582111 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/compute.googleapis.com/overview?project=535381582111 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry., accessNotConfigured

We need to treat such errors as Invalid Credential and disable affected models.

We also need a way to trigger a retry - the credential itself will not change but the GCE project will be re-configured to allow the credential out of band to juju. So one idea is to react to a client running juju status on the model - use this operation to trigger a ping of the cloud api with the credential to see if it works this time, and thus mark the credential as valid again. The ping could be something like listing all instances with a filter than returns 0 instances but which validates that the api call completes without error.

Tim McNamara (tim-clicks) wrote :

Thanks for raising this Ian. We'll look to address this as soon as possible.

Changed in juju:
assignee: nobody → Tim McNamara (tim-clicks)
tags: added: gce-provider
Anastasia (anastasia-macmood) wrote :

I am not convinced that the re-try should happen on 'juju status'... What if a user is running 'watch juju status'? If nothing changes on cloud side, you'd still be spamming the logs.

I think a cleaner way is to have a command to re-try current credential...

Currently, in order to flip the credential back to 'valid' on juju side, we need to trigger a re-upload of local credential which is done either via run 'update-credential' or 'add-model --credential'.

Anastasia (anastasia-macmood) wrote :
Changed in juju:
status: Triaged → In Progress
milestone: 2.6.3 → 2.7-beta1
Tim McNamara (tim-clicks) wrote :

I've attempted several things to force the GCE credentials into an error state. If an account is deleted, Google returns an error, but Juju continually retries:

machine-7: 15:47:23 INFO juju.worker.machiner "machine-7" started
machine-7: 15:47:23 ERROR juju.worker.dependency "machiner" manifold worker returned unexpected error: cannot update observed network config: cannot get network interfaces of "juju-eaa547-7": Get https://www.googleapis.com/compute/v1/projects/juju-qa-checking-broken-creds/aggregated/instances?alt=json&filter=name+eq+juju-eaa547-.%2A: oauth2: cannot fetch token: 400 Bad Request
Response: {
  "error": "invalid_grant",
  "error_description": "Not a valid email or user ID."
}

Anastasia (anastasia-macmood) wrote :

That's interesting... I am pretty sure that 400 Bad Request is catered for in the code to invalidate a credential.

Thank you for taking a look!

Tim McNamara (tim-clicks) wrote :

If a service account with insufficient permissions as a Juju credential, the credential is marked as invalid quickly.

ERROR closing port(s) [17070/tcp]: googleapi: Error 403: Required 'compute.firewalls.delete' permission for 'projects/juju-qa-checking-broken-creds/global/firewalls/juju-845bc02a-4fd0-4cd8-82c4-33262c385ced'
More details:
Reason: forbidden, Message: Required 'compute.firewalls.delete' permission for 'projects/juju-qa-checking-broken-creds/global/firewalls/juju-845bc02a-4fd0-4cd8-82c4-33262c385ced'
Reason: forbidden, Message: Required 'compute.networks.updatePolicy' permission for 'projects/juju-qa-checking-broken-creds/global/networks/default'

Tim McNamara (tim-clicks) wrote :

I've confirmed that the PR referenced above prevents the requests in a loop.

Attempting to bootstrap with a valid credential when the Google Compute API is disabled halts immediately.

Note that Google may have updated their internal systems to prevent a similar situation. It is now impossible to revoke access to the API if resources exist that were created via the API.

Ian Booth (wallyworld) on 2019-05-29
Changed in juju:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers