upgrade-charm: Not reached target cluster-partition-handling mode

Bug #1979092 reported by Felipe Reyes
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack RabbitMQ Server Charm
Fix Committed
Undecided
Felipe Reyes

Bug Description

when upgrading from 21.10 to 22.04 the rabbitmq-server unit goes into waiting state with the message " Not reached target cluster-partition-handling mode", this is because the leader hasn't set "cluster-partition-handling" to the same value that the configuration has.

Felipe Reyes (freyes)
Changed in charm-rabbitmq-server:
assignee: nobody → Felipe Reyes (freyes)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-rabbitmq-server (master)
Changed in charm-rabbitmq-server:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-rabbitmq-server (master)

Reviewed: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/846444
Committed: https://opendev.org/openstack/charm-rabbitmq-server/commit/b35247364f7934579bff9585a071ff9668157e9a
Submitter: "Zuul (22348)"
Branch: master

commit b35247364f7934579bff9585a071ff9668157e9a
Author: Felipe Reyes <email address hidden>
Date: Fri Jun 17 16:36:51 2022 -0400

    Set cluster-partition-handling on upgrade-charm.

    For units deployed before the implementation of the
    cluster-partition-handling strategy they won't have that key set in the
    leader making the charm believe there are pending tasks, so this change
    seeds the key when is not set with the value present in the charm's
    configuration.

    Change-Id: Ifdae35ffee1ad7a8f4e5248c817cca14b69d9566
    Closes-Bug: #1979092

Changed in charm-rabbitmq-server:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-rabbitmq-server (stable/jammy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-rabbitmq-server (stable/jammy)

Reviewed: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/847202
Committed: https://opendev.org/openstack/charm-rabbitmq-server/commit/88112ade7dcd189add29b8b74d18672f537d985f
Submitter: "Zuul (22348)"
Branch: stable/jammy

commit 88112ade7dcd189add29b8b74d18672f537d985f
Author: Felipe Reyes <email address hidden>
Date: Fri Jun 17 16:36:51 2022 -0400

    Set cluster-partition-handling on upgrade-charm.

    For units deployed before the implementation of the
    cluster-partition-handling strategy they won't have that key set in the
    leader making the charm believe there are pending tasks, so this change
    seeds the key when is not set with the value present in the charm's
    configuration.

    Change-Id: Ifdae35ffee1ad7a8f4e5248c817cca14b69d9566
    Closes-Bug: #1979092
    (cherry picked from commit b35247364f7934579bff9585a071ff9668157e9a)

tags: added: in-stable-jammy
tags: added: cdo-qa foundations-engine
Revision history for this message
Tim Andersson (andersson123) wrote :

Hey, if you are all still subbed to this bug, I wanted to ask a question. I am running into this, but the rabbitmq code I am running has these changes implemented already. What could be the problem? Is there a separate issue outside what's discussed in this bug?

Revision history for this message
Tim Andersson (andersson123) wrote :

The rabbitmq node I'm talking about isn't part of a cluster, and the issue of the service not having the same setting as in the settings file isn't present:

from the service:
```
sudo rabbitmqctl eval 'application:get_all_env(rabbit).' | grep cluster_partition_handling
 {cluster_partition_handling,ignore},
```

from the conf file:
```
cat /etc/rabbitmq/rabbitmq.conf
collect_statistics_interval = 30000
mnesia_table_loading_retry_timeout = 30000
mnesia_table_loading_retry_limit = 10
cluster_partition_handling = ignore
```

Revision history for this message
Gabriel Cocenza (gabrielcocenza) wrote :

I'm also facing this issue when using a single unit of rabbitmq.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Sorry for taking so long to get around to this one.

@gabrielcocenza and @andersson123: if you run into this again, could you do a leader-get (juju run -u <app>/leader -- leader-get) [or use "juju exec" for juju 3.x] and see what the 'cluster-partition-handling' key is set to. I'd hazard that it is not the same as the config value. The waiting is set by this code:

    if leader_get(CLUSTER_MODE_KEY) != config(CLUSTER_MODE_KEY):
        return (
            'waiting',
            'Not reached target {} mode'.format(CLUSTER_MODE_KEY))

Thanks.

Revision history for this message
Marcin Wilk (wilkmarcin) wrote :

Hi Alex,
I just run into the same problem when doing upgrade from 3.8/stable to 3.9/stable. This is single node deployment

# juju status
juju status rabbitmq-serve
Model Controller Cloud/Region Version SLA Timestamp
octavia wilkmarcin-stsstack stsstack/stsstack 2.9.46 unsupported 10:40:42Z

App Version Status Scale Charm Channel Rev Exposed Message
rabbitmq-server 3.8.2 waiting 1 rabbitmq-server 3.9/stable 183 no Not reached target cluster-partition-handling mode

Unit Workload Agent Machine Public address Ports Message
rabbitmq-server/0* waiting idle 24 10.5.2.68 5672/tcp,15672/tcp Not reached target cluster-partition-handling mode

Machine State Address Inst id Series AZ Message
24 started 10.5.2.68 5a1b7e48-cb3a-4399-96fa-fa4c6f862897 focal nova ACTIVE

# config option
juju config rabbitmq-server cluster-partition-handling
ignore

# leader-get
juju run -u rabbitmq-server/0 -- leader-get
__leader_get_migrated_settings__: '["octavia.passwd", "glance.passwd", "neutron.passwd",
  "cinder.passwd", "ceilometer.passwd", "nova.passwd"]'
ceilometer.passwd: zZpPS8j9Xt8ScT5nCHMJkkx8prVxBZJwrmpBNTmjV3x96PYcMpTSbgydG9jncYpZ
cinder.passwd: b3RWy47px6FZXJmtzRRgGy8w3TKyXtRVFj5mNdCh3gmsxR9CXCss7T8cnLp3HghN
coordinator: '{"rabbitmq-server/0": {}}'
glance.passwd: f87n8wpgyhsNgp2ysbmshGT4zthBpYYZtw234zxL2B2tWKnpX5MTMMPgk8dsLPHz
neutron.passwd: LY87mZcckFjMr4jbKmJwBM3GW477p4rVdYdCWSMtwHbkmXhcrVzpx8gMfr3tsX4B
nova.passwd: Hm3VFHXjk8twFskwC3LTzBFbLzKMc7VSf8PRwBwY9yrdpr5MHgwm4PmRyBCtr2BH
octavia.passwd: 2hBqfkhbjmZMj3zV3wm6zcHFxT4YrjtjjcjFxT3ZrKXGw7sVVM5BtbHxx4T8x6ZC

# from the unit:
ubuntu@juju-5ef7f4-octavia-24:~$ sudo rabbitmqctl eval 'application:get_all_env(rabbit).' | grep cluster_partition_handling
 {cluster_partition_handling,ignore},
ubuntu@juju-5ef7f4-octavia-24:~$ grep cluster_partition_handling /etc/rabbitmq/rabbitmq.conf
cluster_partition_handling = ignore

Thanks
Marcin

Revision history for this message
Marcin Wilk (wilkmarcin) wrote :

cont.

So I added the config manually to the leader:

juju run -u rabbitmq-server/0 -- leader-set cluster-partition-handling=ignore

# leader-get again
juju run -u rabbitmq-server/0 -- leader-get
__leader_get_migrated_settings__: '["octavia.passwd", "glance.passwd", "neutron.passwd",
  "cinder.passwd", "ceilometer.passwd", "nova.passwd"]'
ceilometer.passwd: zZpPS8j9Xt8ScT5nCHMJkkx8prVxBZJwrmpBNTmjV3x96PYcMpTSbgydG9jncYpZ
cinder.passwd: b3RWy47px6FZXJmtzRRgGy8w3TKyXtRVFj5mNdCh3gmsxR9CXCss7T8cnLp3HghN
cluster-partition-handling: ignore
coordinator: '{"rabbitmq-server/0": {}}'
glance.passwd: f87n8wpgyhsNgp2ysbmshGT4zthBpYYZtw234zxL2B2tWKnpX5MTMMPgk8dsLPHz
neutron.passwd: LY87mZcckFjMr4jbKmJwBM3GW477p4rVdYdCWSMtwHbkmXhcrVzpx8gMfr3tsX4B
nova.passwd: Hm3VFHXjk8twFskwC3LTzBFbLzKMc7VSf8PRwBwY9yrdpr5MHgwm4PmRyBCtr2BH
octavia.passwd: 2hBqfkhbjmZMj3zV3wm6zcHFxT4YrjtjjcjFxT3ZrKXGw7sVVM5BtbHxx4T8x6ZC

# update-status hook
juju run -u rabbitmq-server/0 -- hooks/update-status
none
none
Waiting for pid file '/<email address hidden>' to appear
pid is 646
Waiting for erlang distribution on node 'rabbit@juju-5ef7f4-octavia-24' while OS process '646' is running
Waiting for applications 'rabbit_and_plugins' to start on node 'rabbit@juju-5ef7f4-octavia-24'
Applications 'rabbit_and_plugins' are running on node 'rabbit@juju-5ef7f4-octavia-24'
active

# and it looks good
juju status rabbitmq-server
Model Controller Cloud/Region Version SLA Timestamp
octavia wilkmarcin-stsstack stsstack/stsstack 2.9.46 unsupported 11:04:23Z

App Version Status Scale Charm Channel Rev Exposed Message
rabbitmq-server 3.8.2 active 1 rabbitmq-server 3.9/stable 183 no Unit is ready

Unit Workload Agent Machine Public address Ports Message
rabbitmq-server/0* active idle 24 10.5.2.68 5672/tcp,15672/tcp Unit is ready

Machine State Address Inst id Series AZ Message
24 started 10.5.2.68 5a1b7e48-cb3a-4399-96fa-fa4c6f862897 focal nova ACTIVE

Thanks
Marcin

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.