Unable to redefine a ceph monitor

Bug #1827080 reported by Maria Yousaf
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Tingjie Chen

Bug Description

Brief Description
-----------------
Ceph monitor provisioning does not get deleted when the ceph monitor node is deleted.

Severity
--------
Major

Steps to Reproduce
------------------
1. On a 2+2 system, lock and delete compute-0. This leaves two ceph monitors: controller-0 and controller-1.
2. Provision one of the remaining computes as the new ceph monitor by locking the node and running system ceph-mon-add <nodename>, unlock the host
3. Once the host unlocks, check ceph -s. You'll see the following:

[wrsroot@controller-0 ~(keystone_admin)]$ ceph -s
    cluster d31d9359-41d4-4415-9499-0df0b2494db0
     health HEALTH_WARN
            1 mons down, quorum 0,1,3 controller-0,controller-1,compute-1
     monmap e3: 4 mons at {compute-0=192.168.204.60:6789/0,compute-1=192.168.204.89:6789/0,controller-0=192.168.204.3:6789/0,controller-1=192.168.204.4:6789/0}
            election epoch 36, quorum 0,1,3 controller-0,controller-1,compute-1
     osdmap e62: 2 osds: 2 up, 2 in
            flags sortbitwise,require_jewel_osds
      pgmap v86548: 1048 pgs, 12 pools, 3295 MB data, 4224 objects
            6520 MB used, 3340 GB / 3347 GB avail
                1048 active+clean
  client io 188 B/s rd, 384 kB/s wr, 0 op/s rd, 76 op/s wr

Ceph is in HEALTH_WARN state and thinks there should be 4 monitors instead of 3.

Expected Behavior
------------------
There should be 3 monitors: controller-0, controller-1 and compute-1. The deleted node should not appear as a monitor.

Actual Behavior
----------------
The deleted node appears as a monitor.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Standard 2 controller + 2 compute system

Branch/Pull Time/Commit
-----------------------
Master build: 20190427T013000Z

Last Pass
---------
New feature

Timestamp/Logs
--------------
N/A. Easy to reproduce.

Test Activity
-------------
Feature Testing

Revision history for this message
Frank Miller (sensfan22) wrote :

Marking stx.2.0 gating as ceph functionality is required for all StarlingX configurations.

Assigning to Cindy and request assistance to identify a prime to investigate this issue.

Changed in starlingx:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Cindy Xie (xxie1)
tags: added: stx.2.0 stx.retestneeded
Revision history for this message
Cindy Xie (xxie1) wrote :

Maria, can you please retest it with the latest code base due to we've upgraded Ceph to mimic and the functionality is not exactly the same.

Changed in starlingx:
assignee: Cindy Xie (xxie1) → Tingjie Chen (silverhandy)
Revision history for this message
Maria Yousaf (myousaf) wrote :

This is still an issue.

[wrsroot@controller-1 log(keystone_admin)]$ system ceph-mon-list
+--------------------------------------+-------+--------------+------------+------+
| uuid | ceph_ | hostname | state | task |
| | mon_g | | | |
| | ib | | | |
+--------------------------------------+-------+--------------+------------+------+
| 7351acaa-837d-4456-9144-ad64ed977d63 | 20 | controller-1 | configured | None |
| bf488244-a765-42e6-9c27-90cd6012d892 | 20 | compute-1 | configured | None |
| e6415d2e-0289-419f-81a8-94bfcbb443aa | 20 | controller-0 | configured | None |
+--------------------------------------+-------+--------------+------------+------+

[wrsroot@controller-1 log(keystone_admin)]$ ceph -s
  cluster:
    id: 6415e2f2-5f18-472e-a618-3b78448c1635
    health: HEALTH_WARN
            1/4 mons down, quorum controller-0,controller-1,compute-1

  services:
    mon: 4 daemons, quorum controller-0,controller-1,compute-1, out of quorum: compute-0
    mgr: controller-1(active), standbys: controller-0
    osd: 2 osds: 2 up, 2 in
    rgw: 1 daemon active

  data:
    pools: 9 pools, 856 pgs
    objects: 2.02 k objects, 2.9 GiB
    usage: 6.0 GiB used, 886 GiB / 892 GiB avail
    pgs: 856 active+clean

  io:
    client: 448 KiB/s wr, 0 op/s rd, 94 op/s wr

[wrsroot@controller-1 log(keystone_admin)]$ cat /etc/build.info
###
### StarlingX
### Built from master
###

OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20190506T233000Z"

JOB="STX_build_master_master"
<email address hidden>"
BUILD_NUMBER="93"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-05-06 23:30:00 +0000"

Revision history for this message
Ovidiu Poncea (ovidiuponcea) wrote :

Problem is that when a node is deleted we only remove the monitor from the DB, it should also be removed from Ceph. Search for ceph_mon_destroy() in sysinv's code. One of the functions from conductor/manager.py is:

    def _remove_ceph_mon(self, host):
        if not StorageBackendConfig.has_backend(
            self.dbapi,
            constants.CINDER_BACKEND_CEPH
        ):
            return

        mon = self.dbapi.ceph_mon_get_by_ihost(host.uuid)
        if mon:
            LOG.info("Deleting ceph monitor for host %s"
                     % str(host.hostname))
            self.dbapi.ceph_mon_destroy(mon[0].uuid)
        else:
            LOG.info("No ceph monitor present for host %s. "
                     "Skipping deleting ceph monitor."
                     % str(host.hostname))

This function is called from _unconfigure_worker_host when a worker host is deleted.

To remove the ceph_mon you should look at cephclient (stx-integ/ceph/python-cephclient/python-cephclient) and see which function can be used to delete the monitor from ceph. The cli equivalent is "ceph mon remove {mon-id}", try it first by hand and see if it works, then write the code to do it (before self.dbapi.ceph_mon_destroy(mon[0].uuid) you need to delete the monitort from ceph)

Btw. One approach to test storage stuff w/o reinstalling the setup each time is to leverage virtualbox snapshots, they have their issues yet they are great as you can easily return to a previous state.

tags: added: stx.storage
Changed in starlingx:
status: Triaged → In Progress
Cindy Xie (xxie1)
tags: added: stx.distro.other
Revision history for this message
Tingjie Chen (silverhandy) wrote :

Thanks Ovidiu for your precise comments, the command with delete host (compute-0) include _remove_ceph_mon() but not really remove ceph mon, need to call CephWrapper which inherit CephClient.mon_remove() which execute command: ceph mon remove {hostname}.

I have made draft gerrit: https://review.opendev.org/#/c/659094/ and soon verify the change later to fix the issue.

Revision history for this message
Tingjie Chen (silverhandy) wrote :

I have verified the gerrit patch: https://review.opendev.org/#/c/659094

After lock and delete ceph-mon of compute-0, and ceph-mon-add compute-1, the result of ceph-mon-list as following:
------------------------------
[wrsroot@controller-0 ~(keystone_admin)]$ ceph -s
  cluster:
    id: 53016669-b4a4-441d-a81a-c70792bd4a8d
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum controller-0,controller-1,compute-1
    mgr: controller-0(active), standbys: controller-1
    osd: 2 osds: 2 up, 2 in
    rgw: 1 daemon active

  data:
    pools: 4 pools, 256 pgs
    objects: 1.16 k objects, 1.1 KiB
    usage: 226 MiB used, 498 GiB / 498 GiB avail
    pgs: 256 active+clean

Cindy Xie (xxie1)
tags: removed: stx.distro.other
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/659094
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=70a249595184c3b40f3bebcd780c0e7b327a96be
Submitter: Zuul
Branch: master

commit 70a249595184c3b40f3bebcd780c0e7b327a96be
Author: Chen, Tingjie <email address hidden>
Date: Tue May 14 15:52:52 2019 +0800

    Delete ceph-mon when sysinv system delete host

    Resolve the issue which cannot remove ceph mon when lock and delete
    host, since it need to call ceph client interface to emulate: ceph mon
    remove {hostname} to delete.

    Closes-Bug: 1827080
    Change-Id: Ib0fb9550e66aada73459ca60d5c106945c0635eb
    Signed-off-by: Chen, Tingjie <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :

Reopening this issue as I have tried this on more than one 2+2 system .. I am getting blocked from completing the operation.

I am unable to perform the ceph-mon-add operation to add the compute in step 2 below.

Steps
1. lock and remove the compute that is in the quorum
2. Attempt to provision one of the remaining computes eg. compute-0 as the new ceph monitor by locking compute-0
then attempt the following command:
system ceph-mon-add compute-0

$ system ceph-mon-add compute-0
Node: compute-0 Total target growth size 20 GiB for database (doubled for upgrades), glance, scratch, backup, extension and ceph-mon exceeds growth limit of 1 GiB.

If you look, there is a ceph-mon-lv of size 20
compute-0:~$ sudo lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
  ceph-mon-lv cgts-vg -wi-a----- 20.00g
  docker-lv cgts-vg -wi-ao---- 30.00g
  kubelet-lv cgts-vg -wi-ao---- 10.00g
  log-lv cgts-vg -wi-ao---- <3.91g
  scratch-lv cgts-vg -wi-ao---- <3.91g
  instances_lv nova-local -wi-ao---- <447.13g

$ system ceph-mon-list
+--------------------------------------+-------+--------------+------------+------+
| uuid | ceph_ | hostname | state | task |
| | mon_g | | | |
| | ib | | | |
+--------------------------------------+-------+--------------+------------+------+
| 2f5682bc-cb4c-45e3-9eba-83f114ca9254 | 20 | controller-1 | configured | None |
| ad83fc34-a4dc-4405-bc3d-2796de054c7e | 20 | controller-0 | configured | None |
+--------------------------------------+-------+--------------+------------+------+

Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :

Tried the following loads without success
2019-08-09_20-59-00
2019-08-12_20-59-00

Numan Waheed (nwaheed)
Changed in starlingx:
status: Fix Released → Confirmed
Revision history for this message
Tingjie Chen (silverhandy) wrote :

This issue is side effect for patch: https://review.opendev.org/661900 for LP: 1827119, sorry for that.

The root cause is in function: check_node_ceph_mon_growth(), when add new ceph monitor, the parameter: cgtsvg_growth_gib should be 0, but in current implementation, it default value is ceph_mon_gib and cannot satisfy the condition with check_node_ceph_mon_growth() and raise error messages which block ceph-mon-add operation.

To resolve it, we should make sure the cgtsvg_growth_gib == 0 when new ceph monitor is created, I will soon raise another fixing patch for it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/677196

Changed in starlingx:
status: Confirmed → In Progress
Revision history for this message
Tingjie Chen (silverhandy) wrote :

Before the fixing patch when ceph-mon-add:
-------------------------------------
2019-08-19 02:44:47.234 110684 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: controller, ceph_mon_gib: 21, cgtsvg_growth_gib: 0, cgtsvg_max_free_gib: 167
2019-08-19 02:44:47.246 110684 INFO sysinv.api.controllers.v1.utils [-] get_node_cgtsvg_limit cgtsvg_max_free_gib=1
2019-08-19 02:44:47.251 110684 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: compute-1, ceph_mon_gib: 21, cgtsvg_growth_gib: 21, cgtsvg_max_free_gib: 1
2019-08-19 02:44:47.251 110684 WARNING wsme.api [-] Client-side error: Node: compute-1 Total target growth size 21 GiB for database (doubled for upgrades), glance, scratch, backup, extension and ceph-mon exceeds growth limit of 1 GiB.

After the fixing patch when ceph-mon-add:
-------------------------------------
2019-08-19 12:15:53.499 246664 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: controller, ceph_mon_gib: 21, cgtsvg_growth_gib: 0, cgtsvg_max_free_gib: 167
2019-08-19 12:15:53.523 246664 INFO sysinv.api.controllers.v1.utils [-] get_node_cgtsvg_limit cgtsvg_max_free_gib=1
2019-08-19 12:15:53.531 246664 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: compute-1, ceph_mon_gib: 21, cgtsvg_growth_gib: 0, cgtsvg_max_free_gib: 1
2019-08-19 12:15:53.554 246664 INFO sysinv.api.controllers.v1.utils [-] get_node_cgtsvg_limit cgtsvg_max_free_gib=167
2019-08-19 12:15:53.558 246664 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: controller-0, ceph_mon_gib: 21, cgtsvg_growth_gib: 0, cgtsvg_max_free_gib: 167
2019-08-19 12:15:53.575 246664 INFO sysinv.api.controllers.v1.utils [-] get_node_cgtsvg_limit cgtsvg_max_free_gib=167
2019-08-19 12:15:53.581 246664 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: controller-1, ceph_mon_gib: 21, cgtsvg_growth_gib: 0, cgtsvg_max_free_gib: 167

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/677196
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=f1cd5557433bc45235256a4fe1620773bf3973ec
Submitter: Zuul
Branch: master

commit f1cd5557433bc45235256a4fe1620773bf3973ec
Author: Chen, Tingjie <email address hidden>
Date: Mon Aug 19 20:40:29 2019 +0800

    Resolve issue of operation ceph-mon-add blocked

    The issue is side effect for patch which resolve ceph-mon-modify issue
    with LP: 1827119, since code rework and the case of ceph-mon-add is
    not fully considered.

    To resolve it, we should make sure cgtsvg_growth_gib == 0 when new ceph
    monitor is added, and ceph_mon should not get by uuid with current host
    since it is not monitor yet, we should get the first ceph monitor from
    monitor list and get the correct value of cgtsvg_growth_gib.

    Closes-Bug: 1827080
    Change-Id: I18af83864d80111c9b499720e01a1f4302e65d40
    Signed-off-by: Chen, Tingjie <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Tingjie, please cherrypick the fix to r/stx.2.0 before 2019-08-23

Revision history for this message
John Kung (john-kung) wrote :

@Tinjie, Please note that I posted some concerns with https://review.opendev.org/#/c/677196/2/sysinv/sysinv/sysinv/sysinv/api/controllers/v1/utils.py ; in addition to the comment, the method signature is no longer correct as it never references the 'host' parameter passed in.

I recommend that this bug be reopened.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Bug re-opened as recommended by John Kung.
@Tingjie, Please do NOT cherry-pick to the r/stx.2.0 branch. Since this is a medium priority bug, it will be moved to stx.3.0 on 2019-08-23. So the subsequent fix is only required in master.

Changed in starlingx:
status: Fix Released → Confirmed
Revision history for this message
Tingjie Chen (silverhandy) wrote :
Download full text (4.7 KiB)

1. About ceph_mon_gib, since the data of have to sync between 3 ceph monitors, and when we set ceph-mon-modify in sysinv (although command is on controller-0), we have to check some pre-requirements for all ceph monitors and modify all the configurations.

2. The root cause of the issue: https://bugs.launchpad.net/starlingx/+bug/1827080 , block when ceph-mon-add operation with 2+2 system, it is a long story.

Firstly, another LP: https://bugs.launchpad.net/starlingx/+bug/1827119 is for ceph-mon-modify does not update the ceph-mon partition on worker disk, which only check 2 controllers of ceph monitors, when 2+2, another ceph monitor with compute-0 have no check and when extend the ceph_mon_gib to 40GB, it has exceed the cgtsvg capacity and the compute-0 node turn to failed status after lock - unlock. So I create gerrit: https://review.opendev.org/#/c/661900/ to resolve it, which will check all the ceph monitors, it keep the check flow and made some rework.

But there is another new issue raised, ceph-mon-add compute-1 blocked, actually it is not side effect of patch: https://review.opendev.org/#/c/661900/ because before the patch, system will not check others nodes (compute-0/1, storage-0/1) except controllers(controller-0/1). The patch add the check and let another issue show up: The cgts-vg configuration issue for workers.
In 2+2 deployment, by default we have 3 ceph monitors, controller-0/1 and compute-0, default ceph_mon_gib is 20GB, but compute-1 has pre-defined partition for ceph--mon--lv, it is not mounted because it is not a ceph monitor, but it will cause the check free space function: check_node_ceph_mon_growth() failed when add new ceph monitor, since free space is only 1.156 GB after cgts--vg-ceph--mon—lv partition created and space is reserved in bootstrap phase, but actually we can add ceph monitor for compute-1.
---------------------------------------------
compute-1:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 200G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 500M 0 part /boot
├─sda3 8:3 0 69G 0 part
│ ├─cgts--vg-scratch--lv 253:0 0 3.9G 0 lvm /scratch
│ ├─cgts--vg-log--lv 253:1 0 3.9G 0 lvm /var/log
│ ├─cgts--vg-kubelet--lv 253:2 0 10G 0 lvm /var/lib/kubelet
│ ├─cgts--vg-ceph--mon--lv 253:3 0 20G 0 lvm
│ └─cgts--vg-docker--lv 253:4 0 30G 0 lvm /var/lib/docker
├─sda4 8:4 0 19.5G 0 part /
└─sda5 8:5 0 10G 0 part
  └─nova--local-instances_lv 253:5 0 10G 0 lvm /var/lib/nova/instances
sdb 8:16 0 30G 0 disk

// available size for cgts-vg is 1.156 GB because 20GB already created for cgts--vg-ceph--mon--lv
[sysadmin@controller-1 ~(keystone_admin)]$ system host-lvg-list compute-1
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| UUID | LVG Name | State | Access | Total Size (GiB) | Avail S...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/678203

Changed in starlingx:
status: Confirmed → In Progress
Revision history for this message
Tingjie Chen (silverhandy) wrote :

I have raised another patch with one solution: remove the reservation for ceph-mon-lv on worker nodes. https://review.opendev.org/#/c/678203/ and also recover the check_node_ceph_mon_growth() with previous one with no special condition.
I have verified it and it is in line with expectations, please help review it and give comments 

When ceph-mon-add compute-1, before lock & ceph-mon-add:
=============================
[sysadmin@controller-0 ~(keystone_admin)]$ system host-lvg-list compute-1
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| UUID | LVG Name | State | Access | Total Size (GiB) | Avail Size (GiB) | Current PVs | Current LVs |
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| 28bda568-49fb-41f7-a1b1-41006f8252bf | nova-local | provisioned | wz--n- | 9.996 | 0.0 | 1 | 1 |
| c1ba96d9-d223-429e-9aeb-382eca9c4b82 | cgts-vg | provisioned | wz--n- | 58.968 | 21.156 | 1 | 3 |
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+

When ceph-mon-add compute-1, After lock & ceph-mon-add, and unlock:
===================================
[sysadmin@controller-0 ~(keystone_admin)]$ system host-lvg-list compute-1
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| UUID | LVG Name | State | Access | Total Size (GiB) | Avail Size (GiB) | Current PVs | Current LVs |
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| 28bda568-49fb-41f7-a1b1-41006f8252bf | nova-local | provisioned | wz--n- | 9.996 | 0.0 | 1 | 1 |
| c1ba96d9-d223-429e-9aeb-382eca9c4b82 | cgts-vg | provisioned | wz--n- | 58.968 | 0.156 | 1 | 4 |
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+

Revision history for this message
Ghada Khalil (gkhalil) wrote :

As per agreement with the community, moving all unresolved medium priority bugs from stx.2.0 to stx.3.0

tags: added: stx.3.0
removed: stx.2.0
Revision history for this message
Tingjie Chen (silverhandy) wrote :

Since Reserving space for ceph-mon on all worker nodes was a product requirement from Brent, so I have to find another solution to check_node_ceph_mon_growth with consideration of cgts--vg-ceph--mon--lv reservation.
working on resolve patch ...

Revision history for this message
yong hu (yhu6) wrote :

@Tingjie to make a draft of solution (or patch) and quickly sync-up with Ovidiu.
This LP is important, so pls take it with priority.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config (master)

Change abandoned by Tingjie Chen (<email address hidden>) on branch: master
Review: https://review.opendev.org/678203
Reason: Abandon it since there is new solution patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/685576

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/685576
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=e6c862c99ab74ee40d9d8ae14681bf78b864aaa3
Submitter: Zuul
Branch: master

commit e6c862c99ab74ee40d9d8ae14681bf78b864aaa3
Author: Chen, Tingjie <email address hidden>
Date: Sun Sep 29 10:52:47 2019 +0800

    Rework check_node_ceph_mon_growth with ceph-mon-lv reservation

    Since there are lvg reserve for ceph-mon 20GB by default in
    Worker/Controller, which is the same with default ceph_mon_gib value.

    There are 2 cases:
    1/ Create new ceph-mon for worker(for example), since it has reserved,
    we can get cgtsvg_growth_gib by minus reserve gib: constants.SB_CEPH_MON_GIB
    2. Modify ceph-mon gib, it is the same with previous process, directly
    minus mon.ceph_mon_gib.

    Closes-Bug: 1827080
    Change-Id: Id4890e9f82177398f1fd4a6cda103d806458836a
    Signed-off-by: Chen, Tingjie <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

Verified in Build 2019-10-29_20-00-00

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.