When remove-unit, node information won't be removed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack RabbitMQ Server Charm |
Fix Released
|
Medium
|
Frode Nordahl |
Bug Description
I have deployed a rabbitmq-server cluster, and the status looks like this.
$ juju status rabbitmq-server
Model Controller Cloud/Region Version
rabbitmq devmaas devmaas 2.1.2
App Version Status Scale Charm Store Rev OS Notes
rabbitmq-server 3.5.7 active 3 rabbitmq-server jujucharms 61 ubuntu
Unit Workload Agent Machine Public address Ports Message
rabbitmq-server/0* active idle 0/lxd/6 10.12.1.174 5672/tcp Unit is ready and clustered
rabbitmq-server/1 active idle 1/lxd/6 10.12.1.178 5672/tcp Unit is ready and clustered
rabbitmq-server/2 active idle 2/lxd/6 10.12.1.193 5672/tcp Unit is ready and clustered
Machine State DNS Inst id Series AZ
0 started 10.12.1.248 7c3whf xenial default
0/lxd/6 started 10.12.1.174 juju-6bd42f-0-lxd-6 xenial
1 started 10.12.1.249 ww8nyf xenial default
1/lxd/6 started 10.12.1.178 juju-6bd42f-1-lxd-6 xenial
2 started 10.12.1.246 acdyn8 xenial default
2/lxd/6 started 10.12.1.193 juju-6bd42f-2-lxd-6 xenial
1. The rabbitmq cluster status before removing unit
$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit@
[{nodes,
{running_
{cluster_
{partitions,[]}]
2. The rabbitmq cluster status after removing unit
$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit@
[{nodes,
{running_
{cluster_
{partitions,[]}]
As you can see, in "nodes", the removed unit's host is still there.
This can cause problems, for instance if you re-add an unit with the same hostname.
For now, after remove-unit a rabbitmq-server unit, you will need to run
$ sudo rabbitmqctl forget_cluster_node <rabbit@hostname>
I think this should be run in a hook when the remove-unit runs.
tags: | added: sts |
Changed in charm-rabbitmq-server: | |
assignee: | nobody → Frode Nordahl (fnordahl) |
Changed in charm-rabbitmq-server: | |
milestone: | 17.05 → 17.08 |
Changed in charm-rabbitmq-server: | |
importance: | Undecided → Medium |
Changed in charm-rabbitmq-server: | |
status: | Fix Committed → Fix Released |
This is definitely something that the charm should be handling. For ref, -departed behavior was completely removed in [1] since it was totally broken and since then it has been a gap that needs fixing. As far as I can see we have two options here; we either use actions or we implement some safe logic in -departed hooks. For actions, we could add one that would need to be called on a remaining unit to clean up any departed units or we add an action to the unit that is about to be removed to have it remove itself from the cluster (the latter also potentially usable in a -departed hook). if you wanted to cover cases where a node dies suddenly and irreconcilably then i guess either an action on an extant unit or a -departed hook cleanup on the cluster leader might be best (assuming that the hook fires after the leader has switched in the case where the leader died).
[1] https:/ /github. com/openstack/ charm-rabbitmq- server/ commit/ cba419897dab7e8 5c2baeb23120e3b 8d1824f6c2