Cassandra node is not removed from the cluster after remove-unit action

Bug #1875455 reported by Vladimir Grevtsev on 2020-04-27
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cassandra Juju Charm
Undecided
Unassigned

Bug Description

I had two Cassandra units deployed:

ubuntu@OrangeBox84:~/fce-demo/bootstrap-scripts$ j status
Model Controller Cloud/Region Version SLA Timestamp
cassandra orangebox-cloud-RegionOne orangebox-cloud/RegionOne 2.6.10 unsupported 17:17:28Z

App Version Status Scale Charm Store Rev OS Notes
cassandra active 2 cassandra jujucharms 54 ubuntu

Unit Workload Agent Machine Public address Ports Message
cassandra/0* active idle 0 172.27.86.130 9042/tcp,9160/tcp Live seed
cassandra/2 active idle 2 172.27.86.116 9042/tcp,9160/tcp Live seed

Machine State DNS Inst id Series AZ Message
0 started 172.27.86.130 24010209-c1a5-4666-9967-b874e04cf4e6 bionic nova ACTIVE
2 started 172.27.86.116 7350bc4d-09c5-4358-b209-27e288f8a19d bionic nova ACTIVE

$ nodetool status
Datacenter: juju
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.0.0.28 174.64 KiB 256 100.0% 046c1c82-51f2-408c-8cb8-203a2ca7aae8 cassandra
UN 10.0.0.111 247.91 KiB 256 100.0% b3f44618-d795-489e-8cc3-01ff5e7647ac cassandra

But after I did a "juju remove-unit cassandra/2", my "nodetool status" on the remaining node started to look like this:

ubuntu@juju-35e7bb-cassandra-0:~$ nodetool status
Datacenter: juju
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN 10.0.0.28 174.64 KiB 256 100.0% 046c1c82-51f2-408c-8cb8-203a2ca7aae8 cassandra
UN 10.0.0.111 237.77 KiB 256 100.0% b3f44618-d795-489e-8cc3-01ff5e7647ac cassandra

According to https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/architecture/archDataDistributeFailDetect.html, it is an expected behaviour when node is offline for some unexpected reasons (e.g outage), but this node removal was triggered by the operator explicitly, thus it would had to be removed - and it didn't happen.

Probably, a relation-departed hooks might be improved to include automatic node unregistration (or, at least, let the operator be aware of a unavailable node so he could take some actions?)

Revision history for this message
Stuart Bishop (stub) wrote :

Per https://jaas.ai/cassandra, 'nodes must be manually decommissioned before dropping a unit'. Per https://bugs.launchpad.net/juju-core/+bug/1417874, it is not possible with Juju to cleanly remove the node by destroying the unit. Decomissioning a node cleanly will need to be done via an action (along with most cluster operations, turning uncontrollable magic into explicit operations under user control)

Changed in cassandra-charm:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers