Ceph Monitor host is able to lock when only two ceph monitors are present (ceph commands stop working)

Bug #1978498 reported by Felipe Sanches Zanoni
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Felipe Sanches Zanoni

Bug Description

Brief Description
-----------------
Verified that the operator can lock the ceph mon host despite only two CEPH monitors being available. Then, CEPH commands no longer work. Not sure if that impacts CEPH functions as well.

Severity
--------
Major

Steps to Reproduce
------------------
Set up Standard or a Storage
Verify Ceph has 3 Monitors and it's healthy using "ceph -s" command
Lock one of the ceph monitor host
Verify Ceph has now 2 Monitors in quorum using "ceph -s" command
Lock one more ceph monitor host
Verify that ceph commands stop working (ceph -s, ceph mon stat)

Expected Behavior
------------------
Expected the second CEPH monitor host lock to be rejected.

Actual Behavior
----------------
2nd CEPH monitor can be locked and then ceph commands no longer work.

Reproducibility
---------------
3 out of 3

System Configuration
--------------------
Standard and Storage

Branch/Pull Time/Commit
-----------------------
N/A

Last Pass
---------
2021-11-25

Timestamp/Logs
--------------
// compute-1 locked

$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 3 | compute-1 | worker | locked | disabled | online |
| 4 | compute-2 | worker | unlocked | enabled | available |
| 5 | controller-1 | controller | unlocked | enabled | available |
| 6 | compute-0 | worker | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+

2022-06-08 17:21:26.851 1276253 INFO ceph_manager.monitor [-] Current Ceph health: HEALTH_WARN detail: 1/3 mons down, quorum controller-0,controller-1
2022-06-08 17:21:26.854 1276253 INFO ceph_manager.monitor [-] Created storage alarm 267d646a-165c-42da-a4e4-b579add0f349 - severity: warning, reason: Storage Alarm Condition: HEALTH_WARN. Please check 'ceph -s' for more details., service_affecting: False

// 2 ceph mons, controller-0 and controller-1

$ ceph -s
  cluster:
    id: 561e3a2a-f86a-4932-937a-ebdaffe35d67
    health: HEALTH_WARN
            1/3 mons down, quorum controller-0,controller-1 services:
    mon: 3 daemons, quorum controller-0,controller-1 (age 78s), out of quorum: compute-1

// Verified that is possible to lock controller-1

$ system host-lock controller-1

$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 3 | compute-1 | worker | locked | disabled | online |
| 4 | compute-2 | worker | unlocked | enabled | available |
| 5 | controller-1 | controller | locked | disabled | online |
| 6 | compute-0 | worker | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+

Test Activity
-------------
Regression Testing

Workaround
----------
Ceph commands work again after unlocking the 2nd ceph monitor host.

Changed in starlingx:
assignee: nobody → Felipe Sanches Zanoni (fsanches)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/845631

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/845631
Committed: https://opendev.org/starlingx/config/commit/561a830e7d5e7e9dd6ef4b8537c07fdc6a85055f
Submitter: "Zuul (22348)"
Branch: master

commit 561a830e7d5e7e9dd6ef4b8537c07fdc6a85055f
Author: Felipe Sanches Zanoni <email address hidden>
Date: Mon Jun 13 15:49:52 2022 -0400

    Ceph monitor host is able to lock when only 2 monitors are avilable

    Ceph monitor quorum requires at least 2 monitors up when 3 are
    configured in Standard or Storage setups. If 1 host that has ceph
    monitor configured is locked, no other ceph monitor host can be
    locked.

    Test Plan:
     PASS: AIO-SX CentOS lock/unlock.
     PASS: AIO-DX CentOS lock/unlock standby controller.
     PASS: Storage CentOS lock controller-1. Cannot lock storage-0.
     PASS: Storage CentOS lock controller-1. Force lock storage-0.
     PASS: Standard CentOS lock controller-1. Cannot lock compute-0.
     PASS: Standard CentOS lock controller-1. Force lock compute-0.

    Closes-Bug: #1978498

    Signed-off-by: Felipe Sanches Zanoni <email address hidden>
    Change-Id: If32aeea4712646430fdba06709aa3d4b9e05c51c

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.7.0 stx.storage
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.