rotate password action updates relation data in private-address field

Bug #2051365 reported by Christopher Bartz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Invalid
Undecided
Unassigned
OpenStack RabbitMQ Server Charm
Status tracked in Trunk
Jammy
Triaged
High
Unassigned
Trunk
Fix Committed
High
Unassigned

Bug Description

I have a cross-controller integration between a k8s application and rabbitmq. After I run the rotate-service-user-password action, the relation data bag of the k8s unit gets updated to the private-address field instead of the password field:

$ juju show-unit flask-rabbitmq/0
flask-rabbitmq/0:
  opened-ports: []
  charm: local:jammy/flask-rabbitmq-6
  leader: true
  life: alive
  relation-info:
  - relation-id: 5
    endpoint: amqp
    cross-model: true
    related-endpoint: amqp
    application-data: {}
    related-units:
      rabbitmq-server/0:
        in-scope: true
        data:
          egress-subnets: 10.33.194.148/32
          hostname: 10.33.194.148
          ingress-address: 10.33.194.148
          password: 54ymh56NPnbcb5bs2HWPBxBGrW66tkyB595wN89ZF6jRxt94wzhJ2xxqPy4nzsY5
          private-address: bRVd96pPRcztp65bdHLycwsJf8TxytLHTpmd4CLPLCTGMJ7j3VYtbcBBts3RYtdh

Steps to reproduce:
 - create integration
 - run the action
 - look in the relation data bag of the k8s unit

This problem was observed for a cross-controller integration between a k8s cloud and openstack cloud, and
could be reproduced inside a multipass vm.

rabbitmq-server channel 3.9/stable rev 183
k8s cloud env: juju 2.9.44
openstack cloud env: 3.1.6

multipass env: juju 3.1.7 for both controllers

╰─$ juju status --relations
Model Controller Cloud/Region Version SLA Timestamp
reactive-runner lxd localhost/localhost 3.1.7 unsupported 13:25:09+01:00

App Version Status Scale Charm Channel Rev Exposed Message
rabbitmq-server 3.9.13 active 1 rabbitmq-server 3.9/stable 183 no Unit is ready

Unit Workload Agent Machine Public address Ports Message
rabbitmq-server/0* active idle 4 10.33.194.148 5672,15672/tcp Unit is ready

Machine State Address Inst id Base AZ Message
4 started 10.33.194.148 juju-963f30-4 ubuntu@22.04 Running

Offer Application Charm Rev Connected Endpoint Interface Role
rabbitmq-server rabbitmq-server rabbitmq-server 183 1/1 amqp rabbitmq provider

Integration provider Requirer Interface Type Message
rabbitmq-server:amqp github-runner:amqp rabbitmq regular
rabbitmq-server:cluster rabbitmq-server:cluster rabbitmq-ha peer

╰─$ juju switch microk8s; juju status --relations
Model Controller Cloud/Region Version SLA Timestamp
flask microk8s microk8s/localhost 3.1.7 unsupported 13:29:40+01:00

SAAS Status Store URL
rabbitmq-server active lxd admin/reactive-runner.rabbitmq-server

App Version Status Scale Charm Channel Rev Address Exposed Message
flask-rabbitmq active 1 flask-rabbitmq 6 10.152.183.225 no

Unit Workload Agent Address Ports Message
flask-rabbitmq/0* active idle 10.1.225.163

Integration provider Requirer Interface Type Message
flask-rabbitmq:secret-storage flask-rabbitmq:secret-storage secret-storage peer
rabbitmq-server:amqp flask-rabbitmq:amqp rabbitmq regular

description: updated
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

I'm almost certain that this isn't a rabbitmq bug, but is instead a juju 2.9 <-> 3.1 bug.

Firstly, the rabbitmq-charm isn't cross-model-relation 'safe', in that it doesn't encode the service name with the model, and thus two (say) nova-cloud-controllers in different models would *clash* and bad things would probably happen.

Secondly, I re-validated the 3.9/stable rabbitmq-server charm and checked it non-CMR in Juju 2.9, CMR in Juju 2.9, and I couldn't get it to even form a CMR between Juju 2.9 and Juju 3.1 in that although the offer was made and then received at the 3.1 model, no relation data was ever exchanged.

I've validated that a 3.1 rabbitmq non-CMR works, as well as a

I've validated several scenarios (as in the private-address isn't overwritten):

* Juju 2.9 - same model, no CMR.
* Juju 2.9 - cross-model.
* Juju 3.1 - same model, no CMR.
* Juju 3.1 - cross-model.

But I couldn't get the following to work:

* Juju 2.9 offer, Juju 3.1 claim: couldn't get it working.

And I've not tried this one:

* Juju 3.1 offer, Juju 2.9 claim: haven't tried it.

Thus, I'm not sure this is actually a rabbitmq-server charm bug.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Setting to incomplete; I haven't been able to reproduce it (not sure what I'm doing wrong), and can't reproduce the bug with same-version controllers for offer and claim models with Juju 2.9 and Juju 3.1.

Changed in charm-rabbitmq-server:
status: New → Incomplete
Revision history for this message
Christopher Bartz (bartz) wrote :

Perhaps the problem is related to having a CMR between a k8s and a machine model, as I see the problem when I have a CMR with both juju 3.1 controllers and a k8s model, but don't see the problem when I have a CMR with both machine models.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

> Perhaps the problem is related to having a CMR between a k8s and a machine model, as I see the problem when I have a CMR with both juju 3.1 controllers and a k8s model, but don't see the problem when I have a CMR with both machine models.

I've added juju to the bug; it does sound like an issue with Juju rather than the rabbitmq-server charm; I'm setting the rabbitmq-server charm to invalid; however, if it does turn out to be a charm bug, please do re-open the bug.

Changed in charm-rabbitmq-server:
status: Incomplete → Invalid
Revision history for this message
Ian Booth (wallyworld) wrote :

Do the logs contain this line from when the rabbitmq-server charm updates the amqp relation?

  log("Updating password on key {} on relation_id: {}"
      .format(key, rid),
      INFO)

and

  log("Updating relation {} keys {}"
      .format(relation_id or get_relation_id(),
              ','.join(relation_settings.keys())), DEBUG)

In fact, logs for the model hosting rabbitmq-server with unit level logs set to DEBUG would be great

Revision history for this message
Ian Booth (wallyworld) wrote :

In juju itself, the controller sets the data bag "private-address" in 2 places:

1. when the unit first enters a relation

settings["private-address"] = ingressAddress

2. When network info is refreshed

relSettings.Set("private-address", ingressAddress)

It's hard to see how this is getting set to anything other than a valid address by juju.
Logs would help.

Revision history for this message
Ian Booth (wallyworld) wrote :

If we were to try and reproduce (assuming requested logs don't help), what *exactly* is deployed where? I couldn't find any "flask-rabbitmq" charm for example. The status snippets in the bug description don't really help us to understand what to deploy. Is there a minimal bundle (or ideally just a few charms) that can be deployed and related to reproduce the issue?

Revision history for this message
Christopher Bartz (bartz) wrote :
Download full text (3.7 KiB)

Thank you, Ian, for picking up on this.

The logs give me:

unit-rabbitmq-server-0: 09:15:30 INFO unit.rabbitmq-server/0.juju-log coordinator.Serial Loading state
unit-rabbitmq-server-0: 09:15:30 INFO unit.rabbitmq-server/0.juju-log coordinator.Serial Leader handling coordinator requests
unit-rabbitmq-server-0: 09:15:30 WARNING unit.rabbitmq-server/0.juju-log min-cluster-size is not defined, race conditions may occur if this is not a single unit deployment.
unit-rabbitmq-server-0: 09:15:30 DEBUG unit.rabbitmq-server/0.juju-log Must assume this is a single unit returning 'cluster' ready
unit-rabbitmq-server-0: 09:15:31 DEBUG unit.rabbitmq-server/0.juju-log Running ['/usr/sbin/rabbitmqctl', 'change_password', 'flask-rabbitmq', 'Z2Ygxz5krTHmKbWLRksCFFpJdqP5sCs3cnjStLK5V84954jnJ9JjSMFN7L4YtN3J']
unit-rabbitmq-server-0: 09:15:31 DEBUG unit.rabbitmq-server/0.rotate-service-user-password Changing password for user "flask-rabbitmq" ...
unit-rabbitmq-server-0: 09:15:31 INFO unit.rabbitmq-server/0.juju-log Changed password on rabbitmq for user: flask-rabbitmq
unit-rabbitmq-server-0: 09:15:31 INFO unit.rabbitmq-server/0.juju-log Updating password on key private-address on relation_id: amqp:8
unit-rabbitmq-server-0: 09:15:31 INFO unit.rabbitmq-server/0.juju-log coordinator.Serial Publishing state

For context, we are trying to create a cross-integration from a flask application on k8s to rabbitmq, its kind of an experiment at the moment, so there is no charm published on charmhub. If you want to reproduce it, I have put the charm on https://github.com/cbartz/flask-rabbitmq/raw/checkin-charm/charm/flask-rabbitmq_ubuntu-22.04-amd64.charm. You can deploy it on a k8s model and integrate with the rabbitmq-server SAAS application with

 ╰─$ juju deploy ./flask-rabbitmq_ubuntu-22.04-amd64.charm flask-rabbitmq --resource flask-app-image=cbartz/flask-rabbitmq --resource statsd-prometheus-exporter-image=prom/statsd-exporter:v0.24.0
Located local charm "flask-rabbitmq", revision 9
Deploying "flask-rabbitmq" from local charm "flask-rabbitmq", revision 9 on ubuntu@22.04/stable
╭─ubuntu@reactive-runner ~/flask-rabbitmq/charm ‹checkin-charm●› [microk8s:flask]
╰─$ juju integrate flask-rabbitmq rabbitmq-server

Then switch to the rabbitmq hosting model and run the password action

╭─ubuntu@reactive-runner ~/flask-rabbitmq/charm ‹checkin-charm●› [microk8s:flask]
╰─$ juju switch lxd
microk8s:admin/flask -> lxd:admin/reactive-runner
╭─ubuntu@reactive-runner ~/flask-rabbitmq/charm ‹checkin-charm●› [lxd:reactive-runner]
╰─$ juju run rabbitmq-server/0 rotate-service-user-password service-user=flask-rabbitmq
Running operation 13 with 1 task
  - task 14 on unit-rabbitmq-server-0

Waiting for task 14...

Changing password for user "flask-rabbitmq" ...

Then switch back to the k8s model and display the relational databag.

╭─ubuntu@reactive-runner ~/flask-rabbitmq/charm ‹checkin-charm●› [lxd:reactive-runner]
╰─$ juju switch microk8s
lxd:admin/reactive-runner -> microk8s:admin/flask
╭─ubuntu@reactive-runner ~/flask-rabbitmq/charm ‹checkin-charm●› [microk8s:flask]
╰─$ juju show-unit flask-rabbitmq/0
flask-rabbitmq/0:
  opened-ports: []
  charm: local:jammy/flask-ra...

Read more...

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Well this isn't right:

> unit-rabbitmq-server-0: 09:15:31 INFO unit.rabbitmq-server/0.juju-log Updating password on key private-address on relation_id: amqp:8

It certainly shouldn't be doing that.

Maybe it's not a Juju bug after all? Hmm. Time to stare at the code again. Ian, apologies if I've wasted your time.

I wonder why it only happens on CMRs between k8s and machine models?

Changed in charm-rabbitmq-server:
status: Invalid → New
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

So, the bug is in this code and it's really subtle:

    pattern = re.compile(r"(\S+)_username")
    for rid in relation_ids('amqp'):
        key = None
        for unit in related_units(rid):
            current = relation_get(rid=rid, unit=unit) or {}
            # the username is either as 'username' or '{previx}_username'
            if 'username' in current:
                key = 'password'
                break
            for key in current.keys():
                match = pattern.match(key)
                if match:
                    key = '_'.join((match[1], 'password'))
                    break
            else:
                continue
            break
        if key is not None:
            log("Updating password on key {} on relation_id: {}"
                .format(key, rid),
                INFO)
            relation_set(relation_id=rid,
                         relation_settings={key: new_passwd})

Looking back at the original trace it's setting only the "private-address" key, which happens to be the last key in the relation_id databag. Then it just kind of jumps out at you. I've re-used 'key' in the inner for loop so it is always set to the last key in the databag, and therefore, it always gets set.

I wonder if the key order is different between k8s and machine models, or it's just some other quirk.

Anyway, I'll work on a fix and get it backported.

Changed in juju:
status: New → Invalid
Changed in charm-rabbitmq-server:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Ian Booth (wallyworld) wrote :

Glad you found it! I looked at that charm code, hence the log lines I asked for, but didn't look close enough to see the bug :-)

In Go, iterating maps in non deterministic. So there's no guarantee about key order when Juju serves up relation data to the charm.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

So, in trying to work on a fix, I'm really struggling to reproduce the original error in test code. The only way I can get it to fail is if 'username' of '*_username' is in the relation data.

@Christopher; please could you provide the output of `juju show-unit rabbitmq-server/0` and

I'd like to check whether `username` is actually in the databag on the rabbitmq-server unit:

If you run: `juju exec -u rabbitmq-server/0 -- relation-ids amqp` it'll respond with something like `amqp:2`.

Then if you run `juju exec -u rabbitmq-server/0 -- relation-get -r amqp:2 - flask-rabbitmq/0`

It should show a databag that looks something like:

egress-subnets: 172.20.0.183/32
ingress-address: 172.20.0.183
private-address: 172.20.0.183
username: flask-rabbitmq/0
vhost: openstack

I'm seeking to verify that the username key is present.

Thanks.

Revision history for this message
Christopher Bartz (bartz) wrote :

Sure.

╭─ubuntu@reactive-runner ~ [lxd:reactive-runner]
╰─$ juju show-unit rabbitmq-server/0
rabbitmq-server/0:
  workload-version: 3.9.13
  machine: "4"
  opened-ports:
  - 5672/tcp
  - 15672/tcp
  public-address: 10.33.194.148
  charm: ch:amd64/jammy/rabbitmq-server-183
  leader: true
  life: alive
  relation-info:
  - relation-id: 14
    endpoint: amqp
    cross-model: true
    related-endpoint: amqp
    application-data:
      admin: "true"
      username: flask-rabbitmq
      vhost: /
    related-units:
      remote-61a200e299af4cb38a8eb904a0133df6/0:
        in-scope: true
        data:
          egress-subnets: 10.152.183.50/32
          ingress-address: 10.152.183.50
          private-address: 10.152.183.50
  - relation-id: 3
    endpoint: cluster
    related-endpoint: cluster
    application-data: {}
    local-unit:
      in-scope: true
      data:
        coordinator: '{}'
        egress-subnets: 10.33.194.148/32
        ingress-address: 10.33.194.148
        private-address: 10.33.194.148
╭─ubuntu@reactive-runner ~ [lxd:reactive-runner]
╰─$ juju exec -u rabbitmq-server/0 -- relation-ids amqp
amqp:14
╭─ubuntu@reactive-runner ~ [lxd:reactive-runner]
╰─$ juju exec -u rabbitmq-server/0 -- relation-get -r amqp:14 - remote-61a200e299af4cb38a8eb904a0133df6/0 1 ↵
egress-subnets: 10.152.183.50/32
ingress-address: 10.152.183.50
private-address: 10.152.183.50

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Light bulb moment! It's due to rabbitmq-server not supporting app data bags; it predates their introduction, basically. The key part is:

╰─$ juju exec -u rabbitmq-server/0 -- relation-get -r amqp:14 - remote-61a200e299af4cb38a8eb904a0133df6/0 1 ↵
egress-subnets: 10.152.183.50/32
ingress-address: 10.152.183.50
private-address: 10.152.183.50

i.e. there is no "username" key in the unit relation data, which is what the code relies on. It did, however, expose a bug in the code that cause the overwrite of the private-address field. I'll address that, so it no longer breaks things.

The other part about app data bag support may take a little longer; I'm not sure how it found the username when the units were related/integrated, as it's not clear how it found it in the app data bag, unless the relation-get changes when the relation-updated hook is firing (which it may).

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

See https://bugs.launchpad.net/charm-rabbitmq-server/+bug/2051930 for bug to address the juju application data bag requirement. This bug is solely to resolve the private-address field being overwritten.

Changed in charm-rabbitmq-server:
status: Triaged → In Progress
assignee: nobody → Alex Kavanagh (ajkavanagh)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-rabbitmq-server (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-rabbitmq-server (master)

Reviewed: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/907408
Committed: https://opendev.org/openstack/charm-rabbitmq-server/commit/26cb8350e8655334a6dc9d4723d0d209b73b3273
Submitter: "Zuul (22348)"
Branch: master

commit 26cb8350e8655334a6dc9d4723d0d209b73b3273
Author: Alex Kavanagh <email address hidden>
Date: Thu Feb 1 11:05:43 2024 +0000

    Don't overwrite last relation key when username not found

    When rotating a password, the code updates the password on the relation
    bag for the associated relation. However, if the username wasn't found
    in the relation data (e.g. if it was in app-data instead) then the code
    unfortunately overwrote the last key it looked at (and this was,
    randomly, private-address). This was due to a bug in the code. This
    patch fixes that problem.

    However, the charm (or at least certainly the rotating passwords code)
    doesn't support app data bags as it doesn't find the matching username
    to update the relation data. This means that it doesn't support rotating
    passwords with charms that use app-data. This is the note added to the
    README.

    Change-Id: I05dac3ae89318ceb28724f4a75d1377a62d32d1c
    Closes-Bug: #2051365

Changed in charm-rabbitmq-server:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.