Ceph Monitor host is able to lock when only two ceph monitors are present (ceph commands stop working)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Felipe Sanches Zanoni |
Bug Description
Brief Description
-----------------
Verified that the operator can lock the ceph mon host despite only two CEPH monitors being available. Then, CEPH commands no longer work. Not sure if that impacts CEPH functions as well.
Severity
--------
Major
Steps to Reproduce
------------------
Set up Standard or a Storage
Verify Ceph has 3 Monitors and it's healthy using "ceph -s" command
Lock one of the ceph monitor host
Verify Ceph has now 2 Monitors in quorum using "ceph -s" command
Lock one more ceph monitor host
Verify that ceph commands stop working (ceph -s, ceph mon stat)
Expected Behavior
------------------
Expected the second CEPH monitor host lock to be rejected.
Actual Behavior
----------------
2nd CEPH monitor can be locked and then ceph commands no longer work.
Reproducibility
---------------
3 out of 3
System Configuration
-------
Standard and Storage
Branch/Pull Time/Commit
-------
N/A
Last Pass
---------
2021-11-25
Timestamp/Logs
--------------
// compute-1 locked
$ system host-list
+----+-
| id | hostname | personality | administrative | operational | availability |
+----+-
| 1 | controller-0 | controller | unlocked | enabled | available |
| 3 | compute-1 | worker | locked | disabled | online |
| 4 | compute-2 | worker | unlocked | enabled | available |
| 5 | controller-1 | controller | unlocked | enabled | available |
| 6 | compute-0 | worker | unlocked | enabled | available |
+----+-
2022-06-08 17:21:26.851 1276253 INFO ceph_manager.
2022-06-08 17:21:26.854 1276253 INFO ceph_manager.
// 2 ceph mons, controller-0 and controller-1
$ ceph -s
cluster:
id: 561e3a2a-
health: HEALTH_WARN
1/3 mons down, quorum controller-
mon: 3 daemons, quorum controller-
// Verified that is possible to lock controller-1
$ system host-lock controller-1
$ system host-list
+----+-
| id | hostname | personality | administrative | operational | availability |
+----+-
| 1 | controller-0 | controller | unlocked | enabled | available |
| 3 | compute-1 | worker | locked | disabled | online |
| 4 | compute-2 | worker | unlocked | enabled | available |
| 5 | controller-1 | controller | locked | disabled | online |
| 6 | compute-0 | worker | unlocked | enabled | available |
+----+-
Test Activity
-------------
Regression Testing
Workaround
----------
Ceph commands work again after unlocking the 2nd ceph monitor host.
Changed in starlingx: | |
assignee: | nobody → Felipe Sanches Zanoni (fsanches) |
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.7.0 stx.storage |
Fix proposed to branch: master /review. opendev. org/c/starlingx /config/ +/845631
Review: https:/