rbac failure

Bug #1908288 reported by Luke Marsden
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
charm-k8s-postgresql
Fix Released
Medium
Unassigned

Bug Description

When installing postgres from this charm with 'juju deploy cs:~postgresql-charmers/postgresql-k8s' on k8s 1.17.11 with minikube and juju from 2.9/edge, the controller goes into a crashloop with:

root@2566e8d973bbaec9:~# kubectl logs postgres-0 -n kf
2020-12-15 17:48:39,088 INFO: Updating permissions and ownership of /srv
tail: cannot open '/var/log/postgresql/repmgr.log' for reading: No such file or directory
2020-12-15 17:48:39,088 INFO: Updating permissions and ownership of /var/log/postgresql
2020-12-15 17:48:39,088 INFO: Updating permissions and ownership of /srv/pgdata
2020-12-15 17:48:39,088 INFO: Updating permissions and ownership of /srv/pgdata/12
2020-12-15 17:48:39,089 INFO: Updating permissions and ownership of /srv/pgconf
2020-12-15 17:48:39,089 INFO: Overwriting /root/.pgpass, updating secrets
2020-12-15 17:48:39,089 INFO: Overwriting /var/lib/postgresql/.pgpass, updating secrets
2020-12-15 17:48:39,089 INFO: Updating repmgr configuration in /srv/pgconf/repmgr.conf
2020-12-15 17:48:39,090 INFO: PostgreSQL database cluster exists at /srv/pgdata/12/main
2020-12-15 17:48:39,090 INFO: Updating PostgreSQL configuration in /srv/pgconf/12/main/conf.d/juju_charm.conf
Traceback (most recent call last):
  File "/usr/local/bin/docker_entrypoint.py", line 23, in <module>
    pgcharm.docker_entrypoint()
  File "/usr/local/lib/python3.8/dist-packages/pgcharm.py", line 503, in docker_entrypoint
    if is_master():
  File "/usr/local/lib/python3.8/dist-packages/pgcharm.py", line 412, in is_master
    return get_master() == JUJU_POD_NAME
  File "/usr/local/lib/python3.8/dist-packages/pgcharm.py", line 421, in get_master
    masters = [i.metadata.name for i in api.list_namespaced_pod(NAMESPACE, label_selector=master_selector).items]
  File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api/core_v1_api.py", line 15302, in list_namespaced_pod
    return self.list_namespaced_pod_with_http_info(namespace, **kwargs) # noqa: E501
  File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api/core_v1_api.py", line 15413, in list_namespaced_pod_with_http_info
    return self.api_client.call_api(
  File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api_client.py", line 373, in request
    return self.rest_client.GET(url,
  File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/rest.py", line 239, in GET
    return self.request("GET", url,
  File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/rest.py", line 233, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Tue, 15 Dec 2020 17:48:39 GMT', 'Content-Length': '272'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:kf:default\" cannot list resource \"pods\" in API group \"\" in the namespace \"kf\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}

Related branches

description: updated
Revision history for this message
Tom Haddon (mthaddon) wrote :

This is possibly related to https://bugs.launchpad.net/juju/+bug/1907161 - will try to confirm.

Revision history for this message
Tom Haddon (mthaddon) wrote :

Have confirmed this is a juju bug with @wallyword, retargeting.

affects: charm-k8s-postgresql → juju
Revision history for this message
Pen Gale (pengale) wrote :

Confirming bug and putting in 2.9.1 milestone. Might possibly get moved to 3.0.0 if the fix is super complex, or involves breaking changes.

Regardless, the work to fix this is in scope for this cycle.

Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.9.1
description: updated
Revision history for this message
Ian Booth (wallyworld) wrote :

I've now looked into this and the issue appears to be that the operator pod is not being configured with the correct service account.

$ kubectl -n testget sa
NAME SECRETS AGE
default 1 5m36s
postgresql-operator 1 4m31s

But the postgresql-operator pod's service account is set to "default".

I think that's the root cause as the roles set up for the operator service account are sufficient to list pods.

Changed in juju:
milestone: 2.9.1 → 2.9-rc4
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → In Progress
Revision history for this message
Ian Booth (wallyworld) wrote :

Ah scratch that, I was looking at the wrong pod. The operator service account is correctly set up.

Revision history for this message
Ian Booth (wallyworld) wrote :

So, the bug report talks about the "postgres-0" pod which is the workload pod.

By default, workload pods do not get any bespoke service account created for them - they just use the system one called "default". If you want to set things up so that your workload can perform operations on the resources in the namespace/cluster, you need to include a service account in the pod spec. This I think explains the issue reported. Certainly, looking at the charm source code, the pod spec does not specify a service account as would be needed. You would need to specify a service account at the top level, like so:

version: 3
...
serviceAccount:
  automountServiceAccountToken: true
  roles:
    - global: true
      rules:
        - apiGroups: [""]
          resources: ["pods"]
          verbs: ["get", "list"]

The above notwithstanding, I did a test deployment, and got a different error. The error I saw is due to the "patch" operation being disallowed, which is expected since just "list" and "get" are currently included in the service account role.

api.patch_namespaced_pod(JUJU_POD_NAME, NAMESPACE, {"metadata": {"labels": {"role": "master"}}})
...
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'ee827333-b1ac-4168-be24-ae4f122031c8', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'ca879002-6ce5-4011-9020-db334571981e', 'Date': 'Tue, 05 Jan 2021 04:28:09 GMT', 'Content-Length': '348'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"postgresql-0\" is forbidden: User \"system:serviceaccount:controller-ian:postgresql-operator\" cannot patch resource \"pods\" in API group \"\" in the namespace \"controller-ian\"","reason":"Forbidden","details":{"name":"postgresql-0","kind":"pods"},"code":403}

So there's a change that can be made on the juju side to allow "patch" operations for the operator.

But I don't yet see why I get the error above and not the one originally reported.

So to be clear, Juju will need an update, but so too does the charm it seems.

Revision history for this message
Tom Haddon (mthaddon) wrote :

So it sounds like we'd actually need the following once the changes to Juju have landed?:

version: 3
...
serviceAccount:
  automountServiceAccountToken: true
  roles:
    - global: true
      rules:
        - apiGroups: [""]
          resources: ["pods"]
          verbs: ["get", "list", "patch"]

Revision history for this message
Ian Booth (wallyworld) wrote :

I updated a local copy of the charm to add the required service account:

            "serviceAccount": {
                "automountServiceAccountToken": True,
                "roles": [
                    {
                        "global": True,
                        "rules": [
                            {
                                "apiGroups": [""],
                                "resources": ["pods"],
                                "verbs": ["get", "list", "patch"],
                            },
                        ],
                    },
                ],
            },

This allowed the charm to deploy on a copy of Juju which had a small fix to add "patch" to the allowed pod operator roles.

What is still unclear is if the Juju fix is actually needed.

Revision history for this message
John A Meinel (jameinel) wrote :

does this give it get, list, and patch on all pods in the cluster? in the namespace? only its associated pods?
It would be interesting if we could have pods that have the right to modify themselves but not give them open access to all other pods in the namespace, but that might not be something that K8s supports.

Revision history for this message
Ian Booth (wallyworld) wrote :

There is a resourceNames attribute on the Role struct. But the values are not fixed and Juju would need to gain support to patch this value according to what pods are running and their name (if the app is aliased) etc. This is work that is not yet done (there's potential corner cases etc to consider also), so for now it's best to leave the resourceNames empty.

no longer affects: juju
Revision history for this message
Ian Booth (wallyworld) wrote :

Removing juju from the bug target since it's a charm issue.

Revision history for this message
Tom Haddon (mthaddon) wrote :

This has been fixed in revno 7 of cs:~postgresql-charmers/postgresql-k8s. Can you retry and let us know if you have any further issues?

Changed in charm-k8s-postgresql:
importance: Undecided → Medium
status: New → Fix Released
Revision history for this message
Luke Marsden (lukemarsden) wrote :

This seems to be working now, thanks!

At least, the postgres charm isn't failing to come up.

Now to try and use it :-)

You can close this issue, thanks!

Revision history for this message
Luke Marsden (lukemarsden) wrote :
Download full text (4.0 KiB)

Oops, I think I spoke too soon.

While the "app" (?) comes up, the "unit" fails:

App
postgres pgcharm:edge active 1 postgresql-k8s charmstore 7 kubernetes 10.105.231.102

Unit
postgres/0* error idle 172.17.0.70 5432/TCP hook failed: "db-relation-changed"

```
2021-01-07 17:05:13 ERROR juju.worker.caasoperator.uniter.operation runhook.go:136 hook "db-relation-changed" (via hook dispatching script: dispatch) failed: exit status 1
2021-01-07 17:05:13 INFO juju.worker.caasoperator.uniter resolver.go:143 awaiting error resolution for "relation-changed" hook
2021-01-07 17:07:42 INFO juju.worker.caasoperator.uniter resolver.go:143 awaiting error resolution for "relation-changed" hook
2021-01-07 17:07:47 INFO juju.worker.caasoperator.uniter resolver.go:143 awaiting error resolution for "relation-changed" hook
2021-01-07 17:07:48 ERROR juju-log db:29: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 237, in <module>
    ops.main.main(PostgreSQLCharm, use_juju_for_storage=True)
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/ops/main.py", line 402, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/ops/main.py", line 140, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/ops/framework.py", line 278, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/ops/framework.py", line 722, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/ops/framework.py", line 767, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-postgres-0/charm/src/clientrel.py", line 101, in on_db_relation_changed
    master_ip = self.master_service_ip
  File "/var/lib/juju/agents/unit-postgres-0/charm/src/clientrel.py", line 71, in master_service_ip
    svc = self.get_k8s_service(self.master_service_name)
  File "/var/lib/juju/agents/unit-postgres-0/charm/src/clientrel.py", line 87, in get_k8s_service
    return api.read_namespaced_service(name, self.model.name)
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/kubernetes/client/api/core_v1_api.py", line 24150, in read_namespaced_service
    return self.read_namespaced_service_with_http_info(name, namespace, **kwargs) # noqa: E501
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/kubernetes/client/api/core_v1_api.py", line 24245, in read_namespaced_service_with_http_info
    return self.api_client.call_api(
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/kubernetes/client/api_client.py", line 373, in request
    return self.rest...

Read more...

Revision history for this message
Luke Marsden (lukemarsden) wrote :

Note this looks similar to Ian's reported issue but it's actually about "get services" permission, not "patch pods".

Does the new service account that I presume you added need increasing scope to services as well?

Revision history for this message
Tom Barber (spicule) wrote :

Just FYI, I redeployed the Postgres Charm from master just a few minutes ago against the same K8S where it used to fail and its come up fine it seems, no API errors and restarting pods. So that looks like an improvement on my end.

Revision history for this message
Tom Haddon (mthaddon) wrote :

Re-opening this bug based on comment #14 and will ask for some more input/help from Ian.

Changed in charm-k8s-postgresql:
status: Fix Released → New
Revision history for this message
Tom Haddon (mthaddon) wrote :

Hi Luke,

Can you try again with cs:~mthaddon/postgresql-k8s ? It's built with the following changes https://code.launchpad.net/~mthaddon/charm-k8s-postgresql/+git/charm-k8s-postgresql/+merge/396379 - from the error message it looks to me like this _should_ address it, but it'd be good to confirm.

It deploys fine for me, but I wasn't experiencing the issues described.

Revision history for this message
Luke Marsden (lukemarsden) wrote :
Download full text (4.1 KiB)

Just tried with cs:~mthaddon/postgresql-k8s, and still getting the error below.

diff --git a/bundle.yaml b/bundle.yaml
index d172ab0..7cf227c 100644
--- a/bundle.yaml
+++ b/bundle.yaml
@@ -4,7 +4,8 @@ applications:
     charm: "./mlflow.charm"
     scale: 1
   postgres:
- charm: "cs:~postgresql-charmers/postgresql-k8s-7"
+ charm: cs:~mthaddon/postgresql-k8s

```
2021-01-18 15:41:12 ERROR juju.worker.caasoperator.uniter.operation runhook.go:136 hook "db-relation-changed" (via hook dispatching script: dispatch) failed: exit status 1
2021-01-18 15:41:12 INFO juju.worker.caasoperator.uniter resolver.go:143 awaiting error resolution for "relation-changed" hook
2021-01-18 15:41:52 INFO juju.worker.caasoperator.uniter resolver.go:143 awaiting error resolution for "relation-changed" hook
2021-01-18 15:41:53 ERROR juju-log db:29: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 237, in <module>
    ops.main.main(PostgreSQLCharm, use_juju_for_storage=True)
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/ops/main.py", line 402, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/ops/main.py", line 140, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/ops/framework.py", line 278, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/ops/framework.py", line 722, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/ops/framework.py", line 767, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-postgres-0/charm/src/clientrel.py", line 101, in on_db_relation_changed
    master_ip = self.master_service_ip
  File "/var/lib/juju/agents/unit-postgres-0/charm/src/clientrel.py", line 71, in master_service_ip
    svc = self.get_k8s_service(self.master_service_name)
  File "/var/lib/juju/agents/unit-postgres-0/charm/src/clientrel.py", line 87, in get_k8s_service
    return api.read_namespaced_service(name, self.model.name)
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/kubernetes/client/api/core_v1_api.py", line 24150, in read_namespaced_service
    return self.read_namespaced_service_with_http_info(name, namespace, **kwargs) # noqa: E501
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/kubernetes/client/api/core_v1_api.py", line 24245, in read_namespaced_service_with_http_info
    return self.api_client.call_api(
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/kubernetes/client/api_client.py", line 373, in request
    return self.rest_client.GET(url,
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/kubernetes/client/rest.py", line 239, in GET
    return self.request("GET", url,
  File "/var/lib/juju/agents/unit-postgres-0/charm/venv/...

Read more...

Revision history for this message
Ian Booth (wallyworld) wrote :

Hi Luke, can you share the bundle you are using?
(so we can try and reproduce the issue)

Revision history for this message
Luke Marsden (lukemarsden) wrote :
Revision history for this message
Ian Booth (wallyworld) wrote :

And is the mlflow charm uploaded anywhere?

Revision history for this message
Ian Booth (wallyworld) wrote :
Revision history for this message
Ian Booth (wallyworld) wrote :

The charm change correctly creates a service account for the workload pod which allows the postgresql workload itself to perform api requests on service resources.

However, the error indicates that the charm operator is running up against the same issue. Right now, Juju creates the charm operator pod service account with these roles:

Resources: []string{"pods"},
Verbs: []string{"get", "list"},

Resources: []string{"pods/exec"},
Verbs: []string{"create"},

As noted in comment #6, these are not currently configurable. Work is planned to do something along the lines of allowing a charm to declare the permissions it wants and enabling those when confirmed by the human operator.

Short term, we could add "services" to the list.

Revision history for this message
Ian Booth (wallyworld) wrote :

Having said that, the charm seems to be written such that the postgresql workload is responsible for making the cluster api calls needed to get things configured? I wonder if the relation-changed workflow should not also following this pattern so things are done is a consistent way.

Revision history for this message
Luke Marsden (lukemarsden) wrote :

Thanks Ian, I'm not sure if you're referring to the mlflow charm, or the postgres one? I'm assuming the postgres one, and if so, who would be a good person to take a look at this please?

Revision history for this message
Ian Booth (wallyworld) wrote :

Yeah, it's the postgresql charm. The charm author is aware so we'll figure out how to proceed and make the necessary change to either juju or the charm.

Revision history for this message
Ben Hoyt (benhoyt) wrote :

For reference, I was having this issue too, and Kelvin helped me work around this (just for local testing) by adding the role manually using kubectl:

$ kubectl edit role postgresql-operator -n <namespace>

Then under the rules, add one more item:

 - apiGroups:
    - ""
    resources:
    - services
    verbs:
    - ""

Then restart the Juju pod using:

$ kubectl delete pod postgresql-operator-0 -n <namespace>

Once the pod restarted the hook went through fine and the Juju app's status changed to "Pod configured".

Revision history for this message
Stuart Bishop (stub) wrote :

In the postgresql-k8s charm, the operator pod creates services (using pod-set-spec), and needs to look up their IP addresses so they can be published to relations. This uses the read_namepaced_service API call. The workload pods need to read pod details, to learn which pod is master using the list_namespaced_pod API call, and need to apply labels to themselves and their peer pods using the patch_namespaced_pod API call. The pods are labeled as master or standby so the selectors on the services route connections to the correct pods, and so the standby pods can determine which pod is master so they can connect to it.

Revision history for this message
Tom Haddon (mthaddon) wrote :

After some discussion with wallyworld have added juju as a bug target here.

Revision history for this message
Ian Booth (wallyworld) wrote :

This PR adds the extra rules to the operator pod's service account role.

https://github.com/juju/juju/pull/12538

Note that there's still an issue with the charm. On k8s, the controller model name is actually "controller-<controllername>" not just "controller". This allows several controllers to be bootstrapped to the one cluster. So this line in clientrel.py

return api.read_namespaced_service(name, self.model.name)

results in the error below when te charm is deployed to the controller model:

HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'ee827333-b1ac-4168-be24-ae4f122031c8', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'ca879002-6ce5-4011-9020-db334571981e', 'Date': 'Thu, 21 Jan 2021 08:32:21 GMT', 'Content-Length': '364'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"services \"postgresql-master\" is forbidden: User \"system:serviceaccount:controller-ian:postgresql-operator\" cannot get resource \"services\" in API group \"\" in the namespace \"controller\"","reason":"Forbidden","details":{"name":"postgresql-master","kind":"services"},"code":403}

Revision history for this message
Ian Booth (wallyworld) wrote :

The charm environment doesn't currently expose controller name, just model name. The charm should probably use the k8s apis to discover the namespace its pod is running in and use that directly instead of assuming the namespace equals the model name.

Revision history for this message
Stuart Bishop (stub) wrote :

Deployment and relating is working with 2.9-rc6-focal-amd64 (2.9-rc6-09cf986) and micrk8s with rbac enabled.

Separate to this bug, deployment to the controller model is not working. Either juju needs to expose the model name or we need to work out how a pod can query its namespace with the k8s api or similar.

Changed in charm-k8s-postgresql:
status: New → Fix Released
John A Meinel (jameinel)
no longer affects: juju
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.