cannot re-validate invalidated credentials

Bug #1852412 reported by Stamatis Katsaounis
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Medium
Anastasia

Bug Description

In our Juju OpenStack Deployment backed up by MaaS the following things happen:

1) our Deployed by Juju Machines are locked in MaaS
2) when we are removing all units from a locked in MaaS machine then a weird things happen:
   - First of all the machine cannot be released in MaaS as it happens with unlocked machines in
     MaaS
   - If we want to remove the machine from Juju then the only option is with --force --no-wait. If
     we try to unlock it after removing the last Juju unit from it then nothing happens.
3) After actions described in 2) our Juju Cloud Credentials for MaaS are becoming invalid and
   cannot be updated because Juju complains that machine with id XXXX is not present in MaaS

In my opinion the whole process is buggy. If we consider the actions as individuals then more or less they are the expected. For example, I am expecting not to be able to release a locked machine. On the other hand, I am not expecting Juju to become messed and not be able to understand that the machine is locked. Even more, the invalidation of the credentials is very bad.

Our dirty solution is to:
1) force removing machine from Juju
2) unlocking the machine on MaaS
3) releasing the machine on MaaS
4) having two MaaS credentials. When the above situation happens we switch the model credentials to the second one, we are updating the first one which becomes again valid and then we are switching credentials again.

I am looking forward to your response.

King regards,
Stamatis

Revision history for this message
Richard Harding (rharding) wrote :

Thanks, as you've noted Juju doesn't understand locked machines in MAAS. It attempts to work against MAAS to remove the machine, gets strange feedback, and gets itself into a poor state.

Since it's in this fuzzy unexpected state you can tell Juju to remove the machine from it's own database/thought space by the remove --force, but it's not ideal.

We should be able to better detect and provider improved user feedback when MAAS notes a machine is locked.

Changed in juju:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Alexandros Soumplis (soumplis) wrote :

What is critical and very buggy is that when Juju comes into this fuzzy state, it marks the cloud credentials as invalid thus one has to change them before doing any further actions with juju.

Revision history for this message
Richard Harding (rharding) wrote : Re: [Bug 1852412] Re: Release of Locked Machines leads to Invalid Credentials

I see, Juju is seeing the errors coming back from MAAS as the same thing as
trying to contact MAAS with invalid credentials, probably "permission
denied" and so we make an assumption there. We'll have to investigate if we
can tell the different in that response or not.

On Wed, Nov 13, 2019 at 12:10 PM Alexandros Soumplis <
<email address hidden>> wrote:

> What is critical and very buggy is that when Juju comes into this fuzzy
> state, it marks the cloud credentials as invalid thus one has to change
> them before doing any further actions with juju.
>
> --
> You received this bug notification because you are subscribed to juju.
> https://bugs.launchpad.net/bugs/1852412
>
> Title:
> Release of Locked Machines leads to Invalid Credentials
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1852412/+subscriptions
>

Revision history for this message
Anastasia (anastasia-macmood) wrote : Re: Release of Locked Machines leads to Invalid Credentials

@Richard Harding (rharding) is right: when Juju gets MAAS permission errors https://github.com/juju/juju/blob/develop/provider/maas/errors.go#L13, it will invalidate used cloud credential.

However, I do not understand why you need to swap credentials. If the machine is successfully removed from Juju, you should be able to re-validate that credential immediately 'juju update-credential <MAAS cloud name> <credential name> --controller <controller name>' since the machine check will pass.

What version of Juju are you using? Do you have logs from a failed credential re-validation? Could you please share your model's status at this stage?

Revision history for this message
Anastasia (anastasia-macmood) wrote :

I do believe that there are 2 issues here - (1) is that Juju does not recognised 'locked' machines in MAAS or at least does not know how to handle them; (2) is re-validating credential after machine removal.

I'll keep this report for dealing with credential re-validation since I am not convinced that Juju current behavior re:locked machine issue is not acceptable... With plain remove-machine, Juju reached out to MAAS and MAAS erred out. Running force remove is the only option here (if you really really want the machine gone) since we can only remove Juju reference and not touch the provider instance... Yes, we can also notify the user, i.e. in some way to say "cloud instance still exists, you need to deal with it manually", but besides this additional message, the flow of removal process will not change: you'd still need to go to MAAS to remove offending machine.

So, I'll rename this bug to reflect your difficulties with re-validating a credential since the fact that the machine was locked in MAAS is incidental. I'll add a smaller bug for user message on machine removal.

summary: - Release of Locked Machines leads to Invalid Credentials
+ cannot re-validate invalidated credentials
Changed in juju:
status: Triaged → Incomplete
Revision history for this message
Stamatis Katsaounis (skatsaounis) wrote :

When the opportunity arises I will try to replay the scenario knowing the background you shed light to and share with you any logs being produced. At the moment we cannot provide such logs.

One thing that is certain is that the update credentials command is not working if credentials are not removed from the model. In order to update them when in that state, the credentials must be removed from model (swapping is something came out by trial and error) and then the update succeeds upon the unused credentials.

In my opinion it cannot be incidental because by our mistake (I use the word mistake because we have found it is indeed an issue) we have tried to remove all units from a locked machine a couple of times (~5) and every time that happens the credentials are becoming invalid.

Kind regards,
Stamatis

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Thank you - logs will really help :)

I am still unsure about what Juju version you are on...

I do know that there was a problem with a corner case scenario for update-credential on 2.6 but it has been fixed.

Revision history for this message
Stamatis Katsaounis (skatsaounis) wrote :

Hi Anastasia. I just reproduced the error.

Juju client version: 2.6.10-bionic-amd64
Juju controller version: 2.6.9

Steps:
1) MaaS Machine w367ps is Locked and Deployed
2) Juju machine id for w367ps is 123 and contains only unit xxx/42
3) run command: juju remove-unit xxx/42 -m openstack
4) credentials are becoming invalid after removing unit and failing to release MaaS machine
5) Trying to update the credentials as you mentioned and receiving the following output:

$ juju update-credential <cloud_name> <credentials_name> --controller <controler_name>

Credential valid for:
  kubernetes
Credential invalid for:
  openstack:
    no machine with instance "w367ps"
Controller credential "credentials_name" for user "admin" on cloud "cloud_name" not updated: some models are no longer visible.

Where can I find logs for this failure?

Revision history for this message
Anastasia (anastasia-macmood) wrote :

I have a strong feeling that this has the same underlying cause as another bug that I have fixed on 2.7 where a credential was invalidated for a controller model even if it was hosted models that were affected.

I'll chase this up today and will try to reproduce both on 2.6 and 2.7 for confirmation. If that is the case, I think we can backport the fix to 2.6...

Thank you for additional info :) Reproducible scenario is great :D The logs would be controller logs, so on controller machine.

Changed in juju:
status: Incomplete → In Progress
assignee: nobody → Anastasia (anastasia-macmood)
Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Stamatis Katsaounis (skatsaounis),

Could you please confirm that after (4), once you unlock MAAS machine and release it, (5) still fails?

Revision history for this message
Anastasia (anastasia-macmood) wrote :

When Juju updates a credential there are some checks that are testing its validity (the same checks are used during model migration to ensure completeness):

1. Based on the list of machines in Juju, can Juju *see* corresponding instances in the cloud?
2. Based on the instances that Juju can get from the cloud for *this* model, does it have corresponding machines?

Both checks are strict. However, (2) is more useful to model migration (did we manage to migrate all the machines correctly) than to a credential update. So, as part of the fix for this scenario, I'll relax current credential validity check to only use (1). [FIX PART 1]

In addition, a credential update can be forced. In other words, when a user is 100% sure that the credential is valid and needs to be used, via Juju API, a credential update can be forced to ignore the validity errors. I will expose this functionality to Juju CLI as well so that you can 'juju update-credential <cloud_name> <credentials_name> --controller <controller_name> --force' [FIX PART 2].

However, since Juju 2.7.0 is imminent, I'll put these fixes into 2.7.1. If/when we plan another 2.6 release, we may revisit the necessity to backport it. Meanwhile, thank you for detailing the workaround :D

Changed in juju:
milestone: none → 2.7.1
Revision history for this message
Anastasia (anastasia-macmood) wrote :
Revision history for this message
Stamatis Katsaounis (skatsaounis) wrote :

Hi Anastasia,

I can confirm that after step (4) if I unlock and release the machine on MaaS then the update credentials command works as expected.

In addition I found some related logs to audit.log location. Based on your comments I suppose that they will not be useful to you but I am leaving the snippet here for completion:

{"conversation":{"who":"admin","what":"/snap/juju/9484/bin/juju update-credential <cloud_name> <credentials_name> --controller <controller-name>","when":"2019-11-26T14:05:27Z","model-name":"controller","model-uuid":"<uuid>","conversation-id":"99436e789891a928","connection-id":"810"}}
audit.log:384820:{"request":{"conversation-id":"99436e789891a928","connection-id":"810","request-id":2,"when":"2019-11-26T14:05:27Z","facade":"Cloud","method":"UpdateCredentialsCheckModels","version":5}}
audit.log:384823:{"request":{"conversation-id":"99436e789891a928","connection-id":"810","request-id":3,"when":"2019-11-26T14:06:27Z","facade":"Pinger","method":"Ping","version":1}}
audit.log:384824:{"errors":{"conversation-id":"99436e789891a928","connection-id":"810","request-id":3,"when":"2019-11-26T14:06:27Z","errors":null}}
audit.log:384838:{"errors":{"conversation-id":"99436e789891a928","connection-id":"810","request-id":2,"when":"2019-11-26T14:06:51Z","errors":[{"message":"some models are no longer visible","code":""}]}}

Thank you for taking care of it and I am looking forward to 2.7.1 release

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Stamatis Katsaounis (skatsaounis),

Thank you for the update \o/

It definitely confirms that it is the second part of the check that fails - we can see MAAS machine but cannot find corresponding Juju machine.

So the PR above will help your case somewhat - even though the credential will still be invalidated since MAAS throws up a Permission Error as your MAAS machine is Locked and Juju will still treat it as a credential error, at least when you go to update the credential, the update will succeeded and you would not need to swap credentials underneath.

Revision history for this message
Anastasia (anastasia-macmood) wrote :
Revision history for this message
Anastasia (anastasia-macmood) wrote :

I have opened a separate bug to ensure that we do not overlook fixing Juju treatment of locked machines in MAAS, bug # 1854430.

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.