Ceph monitor provisioning on storage system unexpectedly allows adding a compute into the quorum (missing semantic check)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Daniel Badea |
Bug Description
Brief Description
-----------------
Ceph monitor provisioning (on storage system) seems unexpectedly allow adding a compute into the quorum
(needs a semantic check)
Severity
--------
Standard
Steps to Reproduce
------------------
1. On a 2+2(storage)+X compute system, lock and delete storage-0. This leaves two ceph monitors: controller-0 and controller-1.
2. Provision one of the computes as the new ceph monitor (instead of the remaining storage node)
by locking the node and running system ceph-mon-add <nodename>, unlock the host
3. Once the host unlocks, check ceph -s. You'll see the following:
$ ceph -s
cluster:
id: 364fbdf0-
health: HEALTH_WARN
1 osds down
1 host (4 osds) down
1/4 mons down, quorum controller-
services:
mon: 4 daemons, quorum controller-
mgr: controller-
osd: 8 osds: 4 up, 5 in; 172 remapped pgs
data:
pools: 5 pools, 600 pgs
objects: 636 objects, 2.4 GiB
usage: 3.4 GiB used, 2.3 TiB / 2.3 TiB avail
pgs: 455/1272 objects degraded (35.770%)
373 active+undersized
172 active+
55 active+
io:
client: 84 KiB/s rd, 482 KiB/s wr, 98 op/s rd, 90 op/s wr
Expected Behavior
------------------
Did not expect to be allowed to add the compute node to the quorum in step 2
In a storage configuration you should only be allowed to have controllers 0/1 + storage node in the quorum. (No other configuration is supported)
Actual Behavior
----------------
In step 2, the addition of the worker node was not rejected.
Reproducibility
---------------
Reproducible
System Configuration
-------
Standard 2 controller + 2 storage + X computes
Branch/Pull Time/Commit
-------
BUILD_ID=
Last Pass
---------
Timestamp/Logs
--------------
Test Activity
-------------
bug retest
tags: | added: stx.retestneeded |
Changed in starlingx: | |
status: | Triaged → In Progress |
Marking as stx.3.0 / medium priority - this is an issue if the user provides the wrong monitor during provisioning. So the issue can be avoided, but still the system should prevent this user error.