invalid cmr macaroon when getting cmr secret
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
High
|
Yang Kelvin Liu |
Bug Description
This happened on a sunbeam deployed after a few days:
Command '('/var/
ops.model.
juju: 3.4.2
cloud: maas provider
All 3 units are failing to read the secrets and are in error state. Rebooting the controller fixed it.

Ian Booth (wallyworld) wrote : | #1 |
Changed in juju: | |
status: | New → Incomplete |

Guillaume Boutry (gboutry) wrote : | #2 |

Guillaume Boutry (gboutry) wrote : | #3 |

Guillaume Boutry (gboutry) wrote : | #4 |
- openstack.log Edit (14.9 MiB, text/plain)
Here's the model offering the CMR.
The secret is created by Keystone and sent over the CMR.

Ian Booth (wallyworld) wrote : | #5 |
Unfortunately there's not enough in the logs to pin point the problem. If the problem were ongoing, or reproducible, we could increase the logging and get some extra diagnostics to look at. But as it's now fixed after a reboot, it's hard to say exactly what happened. One guess is clock skew, but it's just a guess.

Christopher Bartz (bartz) wrote : | #6 |
I have this problem too. The consuming model is on juju 3.1 and the offering model is on juju 3.4. See https:/

Ian Booth (wallyworld) wrote : | #7 |
Can we get debug logs from both controllers (if cross contoller cmr) with #cmr and #cmr-auth set to TRACE level logging? ie juju model-config -m controller logging-
Did you check for clock skew across containers / machines?
Does a reboot / controller pod restart fix it?

Laurent Sesquès (sajoupa) wrote : | #8 |
@Ian I implemented the suggested model-config, grabbed the controller logs and put it in you home dir on private fileshare.

Ian Booth (wallyworld) wrote : | #9 |
The logs don't contain the info I would expect to see for validating a macaroon used for cross model secrets. Cross model secrets can be read if the relation the grant is scoped to is accessible by the supplied macaroon. Macaroon checks result in logs like this:
check 1 macaroons with required attrs: map[offer-
(that's from the logs).
There's only 2 such lines, and neither contain a relation tag which would be expected when checking secret access. So it seems like those lines are for other cmr operations.
Can we get logs which correspond to the timestamps of when the charm errors for secret-get happened? And an indication of when the operation was attempted to so we know where to look in the logs?
I should have asked the first time, can we please also turn on TRACE logging for #secrets
Just to check - all other cmr aspects are working as expected? There's no error surfaced in status for any of the saas entries?

Christopher Bartz (bartz) wrote : | #10 |
The SAAS entry in the consuming model shows an error
```
juju status
SAAS Status Store URL
mongodb error juju-34-controller admin/stg-
```
and
juju status mongodb --format yaml
gives
```
mongodb:
url: juju-34-
endpoints:
database:
interface: mongodb_client
role: provider
life: dying
application
current: error
message: 'cannot get discharge from "https:/
third party refused discharge: cannot discharge: permission denied'
since: 15 Jul 2024 10:41:18Z
relations:
database:
- github-
```
fwiw, I also get a permission error in the offer model for a juju grant command
stg-github-
ERROR permission denied
So maybe the root cause is related to permissions.

Ian Booth (wallyworld) wrote : | #11 |
juju list-offers --application foo --format yaml
etc
should show who has access to the offer so you can check permissions

Christopher Bartz (bartz) wrote : | #12 |
Thanks. The permissions seem to be fine.
```
stg-github-
mongodb:
application: mongodb
store: juju-34-controller
charm: ch:amd64/
offer-url: admin/stg-
endpoints:
database:
interface: mongodb_client
role: provider
connections:
- source-model-uuid: foo
username: stg-github-
relation-id: 2
endpoint: database
status:
current: joined
since: "2024-07-15"
users:
admin:
display-name: admin
access: admin
everyone@
access: read
stg-
access: admin
```

Ian Booth (wallyworld) wrote : | #13 |
Attempting to create a grant
stg-github-
ERROR permission denied
Is the stg-github-
Is it possible to get a json or yaml dump of the permissions, applicationOffers, users collections?
Can we get the logs with #secrets set to TRACE and corresponding to a time when the permissions errors are observed?

Laurent Sesquès (sajoupa) wrote : | #14 |
I've added TRACE for #secrets, waited for an issue to be reported (done in: https:/

Ian Booth (wallyworld) wrote : | #15 |
The only logs I can see in controller-
2024-07-16 13:18:27 DEBUG juju.apiserver.
Can we get logs for the 18th when it happened?
The pastebin has a time but no date
unit-github-
I assume that date is 18-7-2024?

Christopher Bartz (bartz) wrote : | #16 |
Yes, these logs are from 18-7-2024.
The problem is ongoing, it appears on every run of config-changed in the consumer (the hook is in error state and gets re-executed), so logs for a particular day should suffice. Here are logs from right now (with date/time stamp): https:/
What I also need to mention is that the following problem (https:/

Ian Booth (wallyworld) wrote : | #17 |
I'm confused. How can the controller-
2024-07-16 13:19:48 ERROR juju.kubernetes
The logs for the controller-beta-ps6 file have timestamps from the 18th.
We need to understand why the permission checks are failing when the macaroon discharge endpoint is called. It could be the TTL caveat cause by clock skew between controllers - I assume they are in sync. It could be juju permission checks are failing. It could be a macaroon decoding issue. The fact the error occurred early on watching the offer status indicates the issue is not secrets related but a general problem with cross controller auth.
There's not a lot to go on. Can we get the mongo collection info asked for in comment 13?

Christopher Bartz (bartz) wrote : | #18 |
Sorry, I do not have access to the controllers, perhaps @Laurent can look into this. My comment was about https:/
I also suspect a general problem with cross-controller auth.

Christopher Bartz (bartz) wrote : | #19 |
I was able to remove the current integration, offer and respective applications (github-

Ian Booth (wallyworld) wrote : | #20 |
We'll need the TRACE logs (#cmr, #cmr-auth, #secrets) and collections dumps to start digging into this.
https:/

Nikolaos Sakkos (nsakkos) wrote : | #21 |
Hi Ian, concerning comment 13, I have gathered yaml dumps of the permissions, application Offers, and users for the two models mentioned.
The commands I used to gather the info were:
- juju show-model stg-github-
- juju offers --format yaml
- juju show-controller --format yaml
- juju users
If there's missing information, could you please provide what commands I should use?
I have also added the missing TRACE logs from juju-controller
The above has been uploaded to your private-filesharing home directory as yaml_dumps.tar.gz .
Unfortunately logs for 2024-07-22 (regarding comment #19) are no longer available.

Ian Booth (wallyworld) wrote : | #22 |
Thanks for the logs etc. But they're not what has been asked for.
We need:
- TRACE logs (#cmr, #cmr-auth, #secrets)
- mongo collection dumps for permissions, applicationOffers, users collections from offering controller
While we're there, let's add another few collections from both offering and consuming controller (if offer and consumer apps are in different controllers, else just the one dump)
- remoteApplications, remoteEntities, bakeryStorageItems
You can get the collection dump from a juju backup as bson and convert those to json. Or you can use mongodump. Or you can use find().pretty() from a mongo shell.
The logs and yaml as provided just don't contain enough information.
Ideally we'd also have a juju status --format yaml for both offering and consuming models at that time as well - this will show the observed status of the offer from the consuming side and help show that it's a generic cmr issue, not secrets per se. If the issue was observed in a given hook, the time the hook ran would be good so we know where to start looking in the logs.

Paul Collins (pjdc) wrote : | #23 |
Ian,
I've uploaded a couple of new files to your home directory with the requested mongodb collections and juju status --format yaml of the relevant models (bug-2065761-status.gz and bug-2065761-mongodb.gz)
The consuming model shows the following:
mongodb:
url: juju-controller
endpoints:
database:
interface: mongodb_client
role: provider
application
current: error
message: 'cannot get discharge from "https:/
third party refused discharge: cannot discharge: permission denied'
since: 01 Aug 2024 21:33:01Z
relations:
database:
- github-
so based on the "since" field I fetched logs for that hour on all six controller units and uploaded them to bug2065761-controller-
I don't see any lines tagged "TRACE" in these files, but there seems to be plenty of activity related to CMR.
I double checked both controllers to confirm logging-config:
prod-is-
#cmr=TRACE; #cmr-auth=TRACE; #secrets=TRACE
prod-is-
juju-controller
#cmr=TRACE; #cmr-auth=TRACE; #secrets=TRACE
juju-controller

Ian Booth (wallyworld) wrote : | #24 |
Thank you very much for the logs etc. This is what I can see.
With the lack of trace logging, that's strange. I'd need to check in detail, it may be the logging config needs to be on the offering model, not just the contrtoller model.
Anyway we can see some things. We have an incoming request to consume an offer
check macaroons with declared attrs: map[offer-
authorize cmr query ops check for bakery.
The user wanting to consume is stg-github-
The offer uuid is 3093e507-
At this point a permission check is needed so a discharge macaroon is generated
generating discharge macaroon because: invalid cmr macaroon
Looking at the permissions collection on the controller hosting the offer
{
"_id" : "ao#3093e507-
"access" : "admin",
"txn-revno" : 2
}
{
"_id" : "ao#3093e507-
"access" : "read",
"txn-revno" : 2
}
The stg-github-

Paul Collins (pjdc) wrote : | #25 |
> The stg-github-
That seems to fit:
juju-controller
juju-controller
description: |
MongoDB is a general purpose distributed document database. This charm
deploys and operates MongoDB.
access: admin
endpoints:
database:
interface: mongodb_client
role: provider
users:
admin:
display-name: admin
access: admin
everyone@
access: read
juju-controller
So in this case the `stg-github-

Paul Collins (pjdc) wrote (last edit ): | #26 |
We have another similar-looking problem with another pair of models, stg-netbox and stg-netbox-k8s, although in this case the symptoms are a little different. `juju status` and offer-related outputs are here, and the controller logs I've already uploaded contain hits for the problem, but let me know if you need them refreshed.
https:/
Unlike the pair of models above, in this case it seems the offer was successfully joined and something went wrong afterwards. Also unlike the previous case, both models are owned by the same user. I tried to grant consume, in case admin doesn't cover that (although it would seem strange for a user to have to grant itself explict access to consume its own offers) and got the following:
stg-netbox@
ERROR permission denied
stg-netbox@
(The role account has `admin` access to the model.)
But then I ran it as the admin and it worked:
juju-controller
juju-controller
juju-controller
description: |
Charm to operate the PostgreSQL database on machines.
access: admin
endpoints:
database:
interface: postgresql_client
role: provider
users:
admin:
display-name: admin
access: admin
everyone@
access: read
stg-netbox:
access: consume
juju-controller
And now:
stg-netbox@
Model Controller Cloud/Region Version SLA Timestamp
stg-netbox-k8s juju-controller
SAAS Status Store URL
postgresql active local admin/stg-
stg-netbox@
Is this expected behaviour?

Paul Collins (pjdc) wrote : | #27 |
Oddly, I'm also seeing different access levels depending on who views the offer:
stg-netbox@
juju-controller
description: |
Charm to operate the PostgreSQL database on machines.
access: admin
endpoints:
database:
interface: postgresql_client
role: provider
users:
admin:
display-name: admin
access: admin
everyone@
access: read
stg-netbox:
access: admin
stg-netbox@
juju-controller
juju-controller
description: |
Charm to operate the PostgreSQL database on machines.
access: admin
endpoints:
database:
interface: postgresql_client
role: provider
users:
admin:
display-name: admin
access: admin
everyone@
access: read
stg-netbox:
access: consume
juju-controller

Paul Collins (pjdc) wrote : | #28 |
I've redumped the collections from earlier into bug-2065761-20240806-

James Simpson (jsimpso) wrote : | #29 |
Just confirmed on a seperate 3.4 model and controller, the offering user *thinks* it has admin access to the offer, but the controller superadmin doesn't show any access at all.
After explicitly granting the offering user "admin" access to the offer, they're able to successfully grant access as expected:
1) Confirming the user thinks it has admin access to the offer, and show the "permission denied" error:
prod-synapse-
juju-controller
description: |
Charm to operate the PostgreSQL database on machines.
access: admin
endpoints:
database:
interface: postgresql_client
role: provider
users:
admin:
display-name: admin
access: admin
everyone@
access: read
prod-
access: admin
prod-synapse-
ERROR permission denied
2) Confirm controller superadmin thinks that the previous user has no explicit access over the offer:
juju-controller
juju-controller
description: |
Charm to operate the PostgreSQL database on machines.
access: admin
endpoints:
database:
interface: postgresql_client
role: provider
users:
admin:
display-name: admin
access: admin
everyone@
access: read
3) Grant the offering user "admin" access to the offer:
juju-controller
juju-controller
juju-controller
description: |
Charm to operate the PostgreSQL database on machines.
access: admin
endpoints:
database:
interface: postgresql_client
role: provider
users:
admin:
display-name: admin
access: admin
everyone@
access: read
prod-
access: admin
4) Successfully grant consume access as the original user
prod-synapse-
prod-synapse-
juju-controller
description: |
Charm to operate the PostgreSQL database on machines.
access: admin
endpoints:
database:
interface: postgresql_client
role: provider
users:
admin:
display-name: admin
access: admin
everyone@
access: read
prod-
acce...

Paul Collins (pjdc) wrote : | #30 |
Re James's comment above, I can confirm stg-github-
stg-github-
juju-controller
description: |
MongoDB is a general purpose distributed document database. This charm
deploys and operates MongoDB.
access: admin
endpoints:
database:
interface: mongodb_client
role: provider
users:
admin:
display-name: admin
access: admin
everyone@
access: read
stg-
access: admin
stg-github-
juju-controller
juju-controller
description: |
MongoDB is a general purpose distributed document database. This charm
deploys and operates MongoDB.
access: admin
endpoints:
database:
interface: mongodb_client
role: provider
users:
admin:
display-name: admin
access: admin
everyone@
access: read
juju-controller

Christopher Bartz (bartz) wrote : | #31 |
In fact, my problems have gone away since Tom Haddon ran
juju-controller
this morning.

Ian Booth (wallyworld) wrote : | #32 |
TL;DR; seems there's a bug checking offer access for users who at not controller superusers but are model admins so for now explicit consume access needs to be granted for those users.
--
There's 2 postgresql offers in different models so it's not 100% clear to me which model is the stg-netbox one hosting the offer "admin/
The permissions for one of the postgresql offers shows
{
"_id" : "ao#f207d2fb-
"access" : "admin",
"txn-revno" : 2
}
{
"_id" : "ao#f207d2fb-
"access" : "read",
"txn-revno" : 2
}
{
"_id" : "ao#f207d2fb-
"access" : "consume",
"txn-revno" : 2
}
Hence the explicit consume permission granted to user "stg-netbox" would allow access.
The model hosting the offer is abd1188b-
{
"_id" : "e#abd1188b-
"access" : "admin",
"txn-revno" : 2
}
This should have been enough to allow access to the offer without needing to grant consume access explicitly. But it seems there's a bug here because looking at the code, I think the check for model admin access during macaroon discharge is done on the controller model, not the model hosting the offer. That might explain why explicit consume access is required even for model admin users.
In terms of the show-offer output, the list of users and their permissions is influenced by the logged in user who is running show-offer. The code looks at explicit offer grants and also includes "admin" access if the logged in user is a model admin. Thus
stg-netbox@
will show access of the stg-netbox logged in user based on them being a model admin
stg-netbox:
access: admin
juju-controller
will show
stg-netbox:
access: consume
since this is showing explicit access grants for users other than the logged in user.
Changed in juju: | |
status: | Incomplete → Triaged |
importance: | Undecided → High |
tags: | added: cross-model |
Changed in juju: | |
milestone: | none → 3.5.4 |
Changed in juju: | |
milestone: | 3.5.4 → 3.5.5 |
Changed in juju: | |
assignee: | nobody → Yang Kelvin Liu (kelvin.liu) |
Changed in juju: | |
milestone: | 3.5.5 → 3.5.6 |
Changed in juju: | |
milestone: | 3.5.6 → 3.5.7 |
Changed in juju: | |
milestone: | 3.5.7 → 3.6.5 |
Changed in juju: | |
milestone: | 3.6.5 → 3.6-next |
To help diagnose this, we really need a bit more information:
- logs from controller and affected models
- possibly a db dump of the application collection from the consuming model
Can we start by getting the logs and we can take a look and see if anything relevant reveals itself?