Cassandra Juju Charm

Cassandra node is not removed from the cluster after remove-unit action

Bug #1875455 reported by Vladimir Grevtsev on 2020-04-27

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Cassandra Juju Charm	Won't Fix	Undecided	Unassigned

Bug Description

I had two Cassandra units deployed:

ubuntu@OrangeBox84:~/fce-demo/bootstrap-scripts$ j status
Model Controller Cloud/Region Version SLA Timestamp
cassandra orangebox-cloud-RegionOne orangebox-cloud/RegionOne 2.6.10 unsupported 17:17:28Z

App Version Status Scale Charm Store Rev OS Notes
cassandra active 2 cassandra jujucharms 54 ubuntu

Unit Workload Agent Machine Public address Ports Message
cassandra/0* active idle 0 172.27.86.130 9042/tcp,9160/tcp Live seed
cassandra/2 active idle 2 172.27.86.116 9042/tcp,9160/tcp Live seed

Machine State DNS Inst id Series AZ Message
0 started 172.27.86.130 24010209-c1a5-4666-9967-b874e04cf4e6 bionic nova ACTIVE
2 started 172.27.86.116 7350bc4d-09c5-4358-b209-27e288f8a19d bionic nova ACTIVE

$ nodetool status
Datacenter: juju
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.0.0.28 174.64 KiB 256 100.0% 046c1c82-51f2-408c-8cb8-203a2ca7aae8 cassandra
UN 10.0.0.111 247.91 KiB 256 100.0% b3f44618-d795-489e-8cc3-01ff5e7647ac cassandra

But after I did a "juju remove-unit cassandra/2", my "nodetool status" on the remaining node started to look like this:

ubuntu@juju-35e7bb-cassandra-0:~$ nodetool status
Datacenter: juju
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN 10.0.0.28 174.64 KiB 256 100.0% 046c1c82-51f2-408c-8cb8-203a2ca7aae8 cassandra
UN 10.0.0.111 237.77 KiB 256 100.0% b3f44618-d795-489e-8cc3-01ff5e7647ac cassandra

According to https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/architecture/archDataDistributeFailDetect.html, it is an expected behaviour when node is offline for some unexpected reasons (e.g outage), but this node removal was triggered by the operator explicitly, thus it would had to be removed - and it didn't happen.

Probably, a relation-departed hooks might be improved to include automatic node unregistration (or, at least, let the operator be aware of a unavailable node so he could take some actions?)

Revision history for this message

Stuart Bishop (stub) wrote on 2020-04-28:

Per https://jaas.ai/cassandra, 'nodes must be manually decommissioned before dropping a unit'. Per https://bugs.launchpad.net/juju-core/+bug/1417874, it is not possible with Juju to cleanly remove the node by destroying the unit. Decomissioning a node cleanly will need to be done via an action (along with most cluster operations, turning uncontrollable magic into explicit operations under user control)

Changed in cassandra-charm:
status:	New → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.