controller models with valid credentials becoming suspended

Bug #1841880 reported by Paul Collins
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Ian Booth

Bug Description

The other day I was rolling out some monitoring services to the JAAS controllers and I noticed that "juju expose" seemed to have no effect. This turned out to be because the controller had suspended the controller model itself: "suspended since cloud credential is not valid".

I ran "juju update-credential $cloud $credential" and the message went away and the newly exposed ports became accessible. I checked again today and two of the controller models are suspended again. Today's victims are an Azure controller running 2.6.5, and an AWS controller running 2.6.6. It's also happened with GCE controllers running 2.6.6.

I've checked the logs and the only message that mentions credentials is a bunch of:

WARNING juju.api.credentialvalidator backend.go:84 cloud credential reference is set for the model but the credential content is no longer on the controller

One of the controllers also reports "credential cloudcred-azure_[REDACTED]@external_hl-az will be deleted but it is used by model [REDACTED]" but this is clearly not the controller's credential and so therefore I assume is unrelated.

I've taken a look at the cloudCredentials collection. Azure isn't too helpful:

juju:PRIMARY> db.cloudCredentials.find({owner: "admin"}).pretty()
{
        "_id" : "azure#admin#jaas-prodstack-cdo-azure",
        "owner" : "admin",
        "cloud" : "azure",
        "name" : "jaas-prodstack-cdo-azure",
        "revoked" : false,
        "auth-type" : "service-principal-secret",
        "attributes" : {
                "application-id" : "[REDACTED#1]",
                "application-password" : "[REDACTED#2]",
                "subscription-id" : "[REDACTED#3]"
        },
        "txn-revno" : NumberLong(14),
        "txn-queue" : [ ],
        "invalid" : true,
        "invalid-reason" : "azure cloud denied access"
}
juju:PRIMARY> _

AWS is more verbose, but the credentials absolutely work, so I'm puzzled why it's in this state:

juju:PRIMARY> db.cloudCredentials.find({owner: "admin"}).pretty()
{
        "_id" : "aws#admin#jaas",
        "owner" : "admin",
        "cloud" : "aws",
        "name" : "jaas",
        "revoked" : false,
        "auth-type" : "access-key",
        "attributes" : {
                "access-key" : "[REDACTED#1]",
                "secret-key" : "[REDACTED#2]"
        },
        "txn-revno" : NumberLong(28),
        "txn-queue" : [ ],
        "invalid" : true,
        "invalid-reason" : "\nThe provided credentials could not be validated and \nmay not be authorized to carry out the request.\nEnsure that your account is authorized to use the Amazon EC2 service and \nthat you are using the correct access keys. \nThese keys are obtained via the \"Security Credentials\"\npage in the AWS console.\n: AWS was not able to validate the provided access credentials (AuthFailure)"
}
juju:PRIMARY> _

I dug through the AWS controller's logs and found some AuthFailure errors, although they were for instances that are not in the controller model, so I assume they're unrelated to this problem.

Paul Collins (pjdc)
description: updated
Revision history for this message
Anastasia (anastasia-macmood) wrote :

All credenitals that are stored on the controller are called 'remote' or 'controller' credentials. Controllers do not actually use credentials, only models do. Controllers only store credentials.

If the models share a credential and that credential is deemed by a cloud provider as 'invalid', i.e. a model starts getting auth errors from cloud calls, then Juju will mark credential as invalid and ALL models that are using it will be suspended.

There are several scenarios that need to be considered here. If your models are sharing a credential that is valid for some models but is not valid for other, then you should be using different credentials in these models. You can 'set-model-credential' to change a model credential.

If the credential becomes invalid and you have run 'update-credential', Juju will temporarily mark it as valid and will try to use it. BUT if you have not actually done anything to ensure that cloud provider considers the credential as valid again, then cloud calls will start failing with auth errors again and Juju will mark the credential as invalid again and suspend models.

I am happy to talk you through individual instances. Feel free to reach out on IRC.
I think that Juju is behaving as expected here.

Changed in juju:
status: New → Invalid
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Also, the actual error message "cloud credential reference is set for the model but the credential content is no longer on the controller" means that a credential was deleted from a credential collection (or it's content has been). However, there are models that reference it (referential integrity is funny in mongo as you know).

Normally, we prevent users from removing credentials that are in-use by models but there is a way around it where a credential removal can be forced despite models referencing it.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Created a bug to clear model credential references when credential deletion is forced - https://bugs.launchpad.net/juju/+bug/1841885

Revision history for this message
Chris Sanders (chris.sanders) wrote :

I'm experiencing this no vsphere as well. Today I was running a test run, and libjuju was able to create a model and start a deploy, but then failed and was unable to even remove the model that it had just created a few minutes earlier.

https://pastebin.canonical.com/p/QF43dJXbvC/

I don't think this is a case of other models have the credential denied, is there any logging I can provide or information to help track this down? This seems to be very reproducible on our vsphere cluster. If there isn't a way for me to enable additional logging, is that something that can be added? We're not seeing any information as to why the credentials are expiring and renew without issue.

Revision history for this message
Chris Sanders (chris.sanders) wrote :

For a little more clarity. This isn't a shared controller, and periodically I get this error and have to run an update to continue using it. I'm not sure that the above is very useful for logging, I'm really looking for what I can/should be providing when I do see this to help track it down.

Revision history for this message
Ian Booth (wallyworld) wrote :

We added extra info for recent 2.8 releases

https://github.com/juju/juju/pull/12629

There's several UX improvements around exposing why a credential got marked as invalid, including the actual underlying cloud error message.

Haw Loeung (hloeung)
Changed in juju:
status: Invalid → New
status: New → Confirmed
Revision history for this message
Haw Loeung (hloeung) wrote :

See also LP:1947535 for a feature request to add an option to retry/reuse the existing credentials for controller models marked suspended.

Revision history for this message
Ian Booth (wallyworld) wrote :

I added support in the azure provider to recognise an auth error and try to refresh the oauth token, or if that fails, get a new one with the existing juju credential. This should hopefully address any such transient auth issues.

https://github.com/juju/juju/pull/13487

Changed in juju:
milestone: none → 2.9.19
assignee: nobody → Ian Booth (wallyworld)
importance: Undecided → High
status: Confirmed → In Progress
Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.