deployed application loses trust after charm upgrade

Bug #1940526 reported by James Page
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Harry Pidcock

Bug Description

Charm: RabbitMQ operator
Juju: 2.9.9

I'm working on enabling clustering using the native K8S support in RMQ to support peer discovery, which is working nicely; I use juju trust to allow the units to access information directly in K8S supporting this method in RMQ.

This all worked fine until I upgraded the charm to a new version; post upgrade, all of the RMQ units loose access to the K8S API with a 403 forbidden being returned, and hence fail to re-join the cluster in turn.

I'll try upgrading to 2.9.x latest.

Revision history for this message
John A Meinel (jameinel) wrote :

So to confirm the issue that you are seeing, if you 'juju trust' a given deployed charm, then when you upgrade the charm it ends up losing the credentials in the upgraded application.

Changed in juju:
importance: Undecided → High
milestone: none → 2.9-next
status: New → Triaged
Revision history for this message
James Page (james-page) wrote :

That does seem to be the case - after the charm upgrades and the containers restart I see:

2021-08-20T08:59:21.720Z [rabbitmq-server] 2021-08-20 08:59:21.720308+00:00 [info] <0.222.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2021-08-20T08:59:21.720Z [rabbitmq-server] 2021-08-20 08:59:21.720529+00:00 [info] <0.222.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2021-08-20T08:59:21.723Z [rabbitmq-server] 2021-08-20 08:59:21.722677+00:00 [warn] <0.313.0> Description: "Authenticity is not established by certificate path validation"
2021-08-20T08:59:21.723Z [rabbitmq-server] 2021-08-20 08:59:21.722677+00:00 [warn] <0.313.0> Reason: "Option {verify, verify_peer} and cacertfile/cacerts is missing"
2021-08-20T08:59:21.723Z [rabbitmq-server] 2021-08-20 08:59:21.722677+00:00 [warn] <0.313.0>
2021-08-20T08:59:21.760Z [rabbitmq-server] 2021-08-20 08:59:21.760240+00:00 [erro] <0.222.0> Failed to fetch a list of nodes from Kubernetes API: 403
2021-08-20T08:59:21.771Z [rabbitmq-server] 2021-08-20 08:59:21.771237+00:00 [erro] <0.222.0> Failed to lock with peer discovery backend rabbit_peer_discovery_k8s: "403"
2021-08-20T08:59:51.772Z [rabbitmq-server] 2021-08-20 08:59:51.771729+00:00 [info] <0.222.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2021-08-20T08:59:51.788Z [rabbitmq-server] 2021-08-20 08:59:51.788045+00:00 [erro] <0.222.0> Failed to fetch a list of nodes from Kubernetes API: 403
2021-08-20T08:59:51.796Z [rabbitmq-server] 2021-08-20 08:59:51.795827+00:00 [erro] <0.222.0> Failed to lock with peer discovery backend rabbit_peer_discovery_k8s: "403"

in the RabbitMQ container log - 403 being unauthorised.

Revision history for this message
James Page (james-page) wrote :

$ kubectl --namespace o7k-testing describe role rabbitmq
Name: rabbitmq
Labels: app.kubernetes.io/managed-by=juju
              app.kubernetes.io/name=rabbitmq
Annotations: controller.juju.is/id: a08533cb-d188-4ca6-8e33-bf518062cadc
              juju.is/version: 2.9.11
              model.juju.is/id: 11b76af9-d284-4803-8e8f-7b874d47fc3c
PolicyRule:
  Resources Non-Resource URLs Resource Names Verbs
  --------- ----------------- -------------- -----
  pods/exec [] [] [create]
  pods [] [] [get list patch]
  services [] [] [get list patch]

Revision history for this message
James Page (james-page) wrote :

$ kubectl --namespace o7k-testing describe clusterrole o7k-testing-rabbitmq
Name: o7k-testing-rabbitmq
Labels: app.kubernetes.io/managed-by=juju
              app.kubernetes.io/name=rabbitmq
Annotations: controller.juju.is/id: a08533cb-d188-4ca6-8e33-bf518062cadc
              juju.is/version: 2.9.11
              model.juju.is/id: 11b76af9-d284-4803-8e8f-7b874d47fc3c
PolicyRule:
  Resources Non-Resource URLs Resource Names Verbs
  --------- ----------------- -------------- -----

Revision history for this message
James Page (james-page) wrote :

I compared these with a brand new app with trust granted:

$ kubectl --namespace o7k-testing describe role rabbitmq-new
Name: rabbitmq-new
Labels: app.kubernetes.io/managed-by=juju
              app.kubernetes.io/name=rabbitmq-new
Annotations: controller.juju.is/id: a08533cb-d188-4ca6-8e33-bf518062cadc
              juju.is/version: 2.9.11
              model.juju.is/id: 11b76af9-d284-4803-8e8f-7b874d47fc3c
PolicyRule:
  Resources Non-Resource URLs Resource Names Verbs
  --------- ----------------- -------------- -----
  *.* [] [] [*]

Revision history for this message
James Page (james-page) wrote :

Removing the trust and then re-adding it to the application mutates the role back to being correct:

$ juju trust rabbitmq --remove

$ kubectl --namespace o7k-testing describe role rabbitmq
Name: rabbitmq
Labels: app.kubernetes.io/managed-by=juju
              app.kubernetes.io/name=rabbitmq
Annotations: controller.juju.is/id: a08533cb-d188-4ca6-8e33-bf518062cadc
              juju.is/version: 2.9.11
              model.juju.is/id: 11b76af9-d284-4803-8e8f-7b874d47fc3c
PolicyRule:
  Resources Non-Resource URLs Resource Names Verbs
  --------- ----------------- -------------- -----
  pods/exec [] [] [create]
  pods [] [] [get list patch]
  services [] [] [get list patch]

$ juju trust rabbitmq --scope=cluster

$ kubectl --namespace o7k-testing describe role rabbitmq
Name: rabbitmq
Labels: app.kubernetes.io/managed-by=juju
              app.kubernetes.io/name=rabbitmq
Annotations: controller.juju.is/id: a08533cb-d188-4ca6-8e33-bf518062cadc
              juju.is/version: 2.9.11
              model.juju.is/id: 11b76af9-d284-4803-8e8f-7b874d47fc3c
PolicyRule:
  Resources Non-Resource URLs Resource Names Verbs
  --------- ----------------- -------------- -----
  *.* [] [] [*]

Revision history for this message
James Page (james-page) wrote (last edit ):

(as a side note it would actually be useful for a charm to be able to explicitly state what access it needs to the K8S API in order to operator - in my charm RabbitMQ just won't cluster until it has this access so maybe it should be considered essential - it can also be scoped to just the namespace and the service itself rather than anything wider)

I can update the Role that Juju creates to match the upstream reference:

kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: rabbitmq
  namespace: test-rabbitmq
rules:
- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get"]
- apiGroups: [""]
  resources: ["events"]
  verbs: ["create"]

and the native K8S peer discovery works fine.

Revision history for this message
James Page (james-page) wrote :

to be clear - the credentials are not lost - just the role configuration that allows the charm units access to the required information in the K8S API.

Tom Haddon (mthaddon)
summary: - deployed application looses trust after charm upgrade
+ deployed application loses trust after charm upgrade
Harry Pidcock (hpidcock)
Changed in juju:
status: Triaged → In Progress
assignee: nobody → Harry Pidcock (hpidcock)
Revision history for this message
Harry Pidcock (hpidcock) wrote :

Just a question regarding this, is this a podspec charm (both before and after upgrade)? Or is this a sidecar charm after the upgrade (podspec -> sidecar or sidecar -> sidecar)?

Ian Booth (wallyworld)
Changed in juju:
milestone: 2.9-next → 2.9.17
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.