Bug #1979092 “upgrade-charm: Not reached target cluster-partitio...” : Bugs : OpenStack RabbitMQ Server Charm

Felipe Reyes (freyes) on 2022-06-17

Changed in charm-rabbitmq-server:
assignee:	nobody → Felipe Reyes (freyes)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-06-17: Fix proposed to charm-rabbitmq-server (master)

#1

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/846444

Changed in charm-rabbitmq-server:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-06-22: Fix merged to charm-rabbitmq-server (master)

#2

Reviewed: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/846444
Committed: https://opendev.org/openstack/charm-rabbitmq-server/commit/b35247364f7934579bff9585a071ff9668157e9a
Submitter: "Zuul (22348)"
Branch: master

commit b35247364f7934579bff9585a071ff9668157e9a
Author: Felipe Reyes <email address hidden>
Date: Fri Jun 17 16:36:51 2022 -0400

Set cluster-partition-handling on upgrade-charm.

    For units deployed before the implementation of the
    cluster-partition-handling strategy they won't have that key set in the
    leader making the charm believe there are pending tasks, so this change
    seeds the key when is not set with the value present in the charm's
    configuration.

Change-Id: Ifdae35ffee1ad7a8f4e5248c817cca14b69d9566
Closes-Bug: #1979092

Changed in charm-rabbitmq-server:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-06-22: Fix proposed to charm-rabbitmq-server (stable/jammy)

#3

Fix proposed to branch: stable/jammy
Review: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/847202

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-08-17: Fix merged to charm-rabbitmq-server (stable/jammy)

#4

Reviewed: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/847202
Committed: https://opendev.org/openstack/charm-rabbitmq-server/commit/88112ade7dcd189add29b8b74d18672f537d985f
Submitter: "Zuul (22348)"
Branch: stable/jammy

commit 88112ade7dcd189add29b8b74d18672f537d985f
Author: Felipe Reyes <email address hidden>
Date: Fri Jun 17 16:36:51 2022 -0400

Set cluster-partition-handling on upgrade-charm.

    For units deployed before the implementation of the
    cluster-partition-handling strategy they won't have that key set in the
    leader making the charm believe there are pending tasks, so this change
    seeds the key when is not set with the value present in the charm's
    configuration.

    Change-Id: Ifdae35ffee1ad7a8f4e5248c817cca14b69d9566
    Closes-Bug: #1979092
    (cherry picked from commit b35247364f7934579bff9585a071ff9668157e9a)

tags:

added: in-stable-jammy

Jeffrey Chang (modern911) on 2023-06-02

tags:

added: cdo-qa foundations-engine

Revision history for this message

Tim Andersson (andersson123) wrote on 2023-08-02:

#5

Hey, if you are all still subbed to this bug, I wanted to ask a question. I am running into this, but the rabbitmq code I am running has these changes implemented already. What could be the problem? Is there a separate issue outside what's discussed in this bug?

Revision history for this message

Tim Andersson (andersson123) wrote on 2023-08-02:

#6

The rabbitmq node I'm talking about isn't part of a cluster, and the issue of the service not having the same setting as in the settings file isn't present:

from the service:
```
sudo rabbitmqctl eval 'application:get_all_env(rabbit).' | grep cluster_partition_handling
{cluster_partition_handling,ignore},
```

from the conf file:
```
cat /etc/rabbitmq/rabbitmq.conf
collect_statistics_interval = 30000
mnesia_table_loading_retry_timeout = 30000
mnesia_table_loading_retry_limit = 10
cluster_partition_handling = ignore
```

Revision history for this message

Gabriel Cocenza (gabrielcocenza) wrote on 2023-10-04:

#7

I'm also facing this issue when using a single unit of rabbitmq.

Revision history for this message

Alex Kavanagh (ajkavanagh) wrote on 2023-12-19:

#8

Sorry for taking so long to get around to this one.

@gabrielcocenza and @andersson123: if you run into this again, could you do a leader-get (juju run -u <app>/leader -- leader-get) [or use "juju exec" for juju 3.x] and see what the 'cluster-partition-handling' key is set to. I'd hazard that it is not the same as the config value. The waiting is set by this code:

    if leader_get(CLUSTER_MODE_KEY) != config(CLUSTER_MODE_KEY):
        return (
            'waiting',
            'Not reached target {} mode'.format(CLUSTER_MODE_KEY))

Thanks.

Revision history for this message

Marcin Wilk (wilkmarcin) wrote on 2024-03-01:

#9

Hi Alex,
I just run into the same problem when doing upgrade from 3.8/stable to 3.9/stable. This is single node deployment

# juju status
juju status rabbitmq-serve
Model Controller Cloud/Region Version SLA Timestamp
octavia wilkmarcin-stsstack stsstack/stsstack 2.9.46 unsupported 10:40:42Z

App Version Status Scale Charm Channel Rev Exposed Message
rabbitmq-server 3.8.2 waiting 1 rabbitmq-server 3.9/stable 183 no Not reached target cluster-partition-handling mode

Unit Workload Agent Machine Public address Ports Message
rabbitmq-server/0* waiting idle 24 10.5.2.68 5672/tcp,15672/tcp Not reached target cluster-partition-handling mode

Machine State Address Inst id Series AZ Message
24 started 10.5.2.68 5a1b7e48-cb3a-4399-96fa-fa4c6f862897 focal nova ACTIVE

# config option
juju config rabbitmq-server cluster-partition-handling
ignore

# leader-get
juju run -u rabbitmq-server/0 -- leader-get
__leader_get_migrated_settings__: '["octavia.passwd", "glance.passwd", "neutron.passwd",
"cinder.passwd", "ceilometer.passwd", "nova.passwd"]'
ceilometer.passwd: zZpPS8j9Xt8ScT5nCHMJkkx8prVxBZJwrmpBNTmjV3x96PYcMpTSbgydG9jncYpZ
cinder.passwd: b3RWy47px6FZXJmtzRRgGy8w3TKyXtRVFj5mNdCh3gmsxR9CXCss7T8cnLp3HghN
coordinator: '{"rabbitmq-server/0": {}}'
glance.passwd: f87n8wpgyhsNgp2ysbmshGT4zthBpYYZtw234zxL2B2tWKnpX5MTMMPgk8dsLPHz
neutron.passwd: LY87mZcckFjMr4jbKmJwBM3GW477p4rVdYdCWSMtwHbkmXhcrVzpx8gMfr3tsX4B
nova.passwd: Hm3VFHXjk8twFskwC3LTzBFbLzKMc7VSf8PRwBwY9yrdpr5MHgwm4PmRyBCtr2BH
octavia.passwd: 2hBqfkhbjmZMj3zV3wm6zcHFxT4YrjtjjcjFxT3ZrKXGw7sVVM5BtbHxx4T8x6ZC

# from the unit:
ubuntu@juju-5ef7f4-octavia-24:~$ sudo rabbitmqctl eval 'application:get_all_env(rabbit).' | grep cluster_partition_handling
{cluster_partition_handling,ignore},
ubuntu@juju-5ef7f4-octavia-24:~$ grep cluster_partition_handling /etc/rabbitmq/rabbitmq.conf
cluster_partition_handling = ignore

Thanks
Marcin

Hi Alex,
I just run into the same problem when doing upgrade from 3.8/stable to 3.9/stable. This is single node deployment

# juju status
juju status rabbitmq-serve
Model    Controller           Cloud/Region       Version  SLA          Timestamp
octavia  wilkmarcin-stsstack  stsstack/stsstack  2.9.46   unsupported  10:40:42Z

App              Version  Status   Scale  Charm            Channel     Rev  Exposed  Message
rabbitmq-server  3.8.2    waiting      1  rabbitmq-server  3.9/stable  183  no       Not reached target cluster-partition-handling mode

Unit                Workload  Agent  Machine  Public address  Ports               Message
rabbitmq-server/0*  waiting   idle   24       10.5.2.68       5672/tcp,15672/tcp  Not reached target cluster-partition-handling mode

Machine  State    Address    Inst id                               Series  AZ    Message
24       started  10.5.2.68  5a1b7e48-cb3a-4399-96fa-fa4c6f862897  focal   nova  ACTIVE

# config option
juju config rabbitmq-server cluster-partition-handling
ignore

# leader-get
juju run -u rabbitmq-server/0 -- leader-get
__leader_get_migrated_settings__: '["octavia.passwd", "glance.passwd", "neutron.passwd",
  "cinder.passwd", "ceilometer.passwd", "nova.passwd"]'
ceilometer.passwd: zZpPS8j9Xt8ScT5nCHMJkkx8prVxBZJwrmpBNTmjV3x96PYcMpTSbgydG9jncYpZ
cinder.passwd: b3RWy47px6FZXJmtzRRgGy8w3TKyXtRVFj5mNdCh3gmsxR9CXCss7T8cnLp3HghN
coordinator: '{"rabbitmq-server/0": {}}'
glance.passwd: f87n8wpgyhsNgp2ysbmshGT4zthBpYYZtw234zxL2B2tWKnpX5MTMMPgk8dsLPHz
neutron.passwd: LY87mZcckFjMr4jbKmJwBM3GW477p4rVdYdCWSMtwHbkmXhcrVzpx8gMfr3tsX4B
nova.passwd: Hm3VFHXjk8twFskwC3LTzBFbLzKMc7VSf8PRwBwY9yrdpr5MHgwm4PmRyBCtr2BH
octavia.passwd: 2hBqfkhbjmZMj3zV3wm6zcHFxT4YrjtjjcjFxT3ZrKXGw7sVVM5BtbHxx4T8x6ZC

# from the unit:
ubuntu@juju-5ef7f4-octavia-24:~$ sudo rabbitmqctl eval 'application:get_all_env(rabbit).' | grep cluster_partition_handling
 {cluster_partition_handling,ignore},
ubuntu@juju-5ef7f4-octavia-24:~$ grep cluster_partition_handling /etc/rabbitmq/rabbitmq.conf
cluster_partition_handling = ignore

Thanks
Marcin

Revision history for this message

Marcin Wilk (wilkmarcin) wrote on 2024-03-01:

#10

cont.

So I added the config manually to the leader:

juju run -u rabbitmq-server/0 -- leader-set cluster-partition-handling=ignore

# leader-get again
juju run -u rabbitmq-server/0 -- leader-get
__leader_get_migrated_settings__: '["octavia.passwd", "glance.passwd", "neutron.passwd",
"cinder.passwd", "ceilometer.passwd", "nova.passwd"]'
ceilometer.passwd: zZpPS8j9Xt8ScT5nCHMJkkx8prVxBZJwrmpBNTmjV3x96PYcMpTSbgydG9jncYpZ
cinder.passwd: b3RWy47px6FZXJmtzRRgGy8w3TKyXtRVFj5mNdCh3gmsxR9CXCss7T8cnLp3HghN
cluster-partition-handling: ignore
coordinator: '{"rabbitmq-server/0": {}}'
glance.passwd: f87n8wpgyhsNgp2ysbmshGT4zthBpYYZtw234zxL2B2tWKnpX5MTMMPgk8dsLPHz
neutron.passwd: LY87mZcckFjMr4jbKmJwBM3GW477p4rVdYdCWSMtwHbkmXhcrVzpx8gMfr3tsX4B
nova.passwd: Hm3VFHXjk8twFskwC3LTzBFbLzKMc7VSf8PRwBwY9yrdpr5MHgwm4PmRyBCtr2BH
octavia.passwd: 2hBqfkhbjmZMj3zV3wm6zcHFxT4YrjtjjcjFxT3ZrKXGw7sVVM5BtbHxx4T8x6ZC

# update-status hook
juju run -u rabbitmq-server/0 -- hooks/update-status
none
none
Waiting for pid file '/<email address hidden>' to appear
pid is 646
Waiting for erlang distribution on node 'rabbit@juju-5ef7f4-octavia-24' while OS process '646' is running
Waiting for applications 'rabbit_and_plugins' to start on node 'rabbit@juju-5ef7f4-octavia-24'
Applications 'rabbit_and_plugins' are running on node 'rabbit@juju-5ef7f4-octavia-24'
active

# and it looks good
juju status rabbitmq-server
Model Controller Cloud/Region Version SLA Timestamp
octavia wilkmarcin-stsstack stsstack/stsstack 2.9.46 unsupported 11:04:23Z

App Version Status Scale Charm Channel Rev Exposed Message
rabbitmq-server 3.8.2 active 1 rabbitmq-server 3.9/stable 183 no Unit is ready

Unit Workload Agent Machine Public address Ports Message
rabbitmq-server/0* active idle 24 10.5.2.68 5672/tcp,15672/tcp Unit is ready

Machine State Address Inst id Series AZ Message
24 started 10.5.2.68 5a1b7e48-cb3a-4399-96fa-fa4c6f862897 focal nova ACTIVE

Thanks
Marcin

cont.

So I added the config manually to the leader:

juju run -u rabbitmq-server/0 -- leader-set cluster-partition-handling=ignore

# leader-get again
juju run -u rabbitmq-server/0 -- leader-get
__leader_get_migrated_settings__: '["octavia.passwd", "glance.passwd", "neutron.passwd",
  "cinder.passwd", "ceilometer.passwd", "nova.passwd"]'
ceilometer.passwd: zZpPS8j9Xt8ScT5nCHMJkkx8prVxBZJwrmpBNTmjV3x96PYcMpTSbgydG9jncYpZ
cinder.passwd: b3RWy47px6FZXJmtzRRgGy8w3TKyXtRVFj5mNdCh3gmsxR9CXCss7T8cnLp3HghN
cluster-partition-handling: ignore
coordinator: '{"rabbitmq-server/0": {}}'
glance.passwd: f87n8wpgyhsNgp2ysbmshGT4zthBpYYZtw234zxL2B2tWKnpX5MTMMPgk8dsLPHz
neutron.passwd: LY87mZcckFjMr4jbKmJwBM3GW477p4rVdYdCWSMtwHbkmXhcrVzpx8gMfr3tsX4B
nova.passwd: Hm3VFHXjk8twFskwC3LTzBFbLzKMc7VSf8PRwBwY9yrdpr5MHgwm4PmRyBCtr2BH
octavia.passwd: 2hBqfkhbjmZMj3zV3wm6zcHFxT4YrjtjjcjFxT3ZrKXGw7sVVM5BtbHxx4T8x6ZC

# update-status hook
juju run -u rabbitmq-server/0 -- hooks/update-status
none
none
Waiting for pid file '/var/lib/rabbitmq/mnesia/rabbit@juju-5ef7f4-octavia-24.pid' to appear
pid is 646
Waiting for erlang distribution on node 'rabbit@juju-5ef7f4-octavia-24' while OS process '646' is running
Waiting for applications 'rabbit_and_plugins' to start on node 'rabbit@juju-5ef7f4-octavia-24'
Applications 'rabbit_and_plugins' are running on node 'rabbit@juju-5ef7f4-octavia-24'
active

# and it looks good
juju status rabbitmq-server
Model    Controller           Cloud/Region       Version  SLA          Timestamp
octavia  wilkmarcin-stsstack  stsstack/stsstack  2.9.46   unsupported  11:04:23Z

App              Version  Status  Scale  Charm            Channel     Rev  Exposed  Message
rabbitmq-server  3.8.2    active      1  rabbitmq-server  3.9/stable  183  no       Unit is ready

Unit                Workload  Agent  Machine  Public address  Ports               Message
rabbitmq-server/0*  active    idle   24       10.5.2.68       5672/tcp,15672/tcp  Unit is ready

Machine  State    Address    Inst id                               Series  AZ    Message
24       started  10.5.2.68  5a1b7e48-cb3a-4399-96fa-fa4c6f862897  focal   nova  ACTIVE

Thanks
Marcin

OpenStack RabbitMQ Server Charm

upgrade-charm: Not reached target cluster-partition-handling mode

Bug Description

Other bug subscribers

Remote bug watches