Bug #1827080 “Unable to redefine a ceph monitor” : Bugs : StarlingX

Revision history for this message

Frank Miller (sensfan22) wrote on 2019-05-01:

#1

Marking stx.2.0 gating as ceph functionality is required for all StarlingX configurations.

Assigning to Cindy and request assistance to identify a prime to investigate this issue.

Changed in starlingx:
status:	New → Triaged
importance:	Undecided → Medium
assignee:	nobody → Cindy Xie (xxie1)
tags:	added: stx.2.0 stx.retestneeded

Revision history for this message

Cindy Xie (xxie1) wrote on 2019-05-07:

#2

Maria, can you please retest it with the latest code base due to we've upgraded Ceph to mimic and the functionality is not exactly the same.

Tingjie Chen (silverhandy) on 2019-05-08

Changed in starlingx:
assignee:	Cindy Xie (xxie1) → Tingjie Chen (silverhandy)

Revision history for this message

Maria Yousaf (myousaf) wrote on 2019-05-08:

#3

This is still an issue.

[wrsroot@controller-1 log(keystone_admin)]$ ceph -s
  cluster:
    id: 6415e2f2-5f18-472e-a618-3b78448c1635
    health: HEALTH_WARN
            1/4 mons down, quorum controller-0,controller-1,compute-1

  services:
    mon: 4 daemons, quorum controller-0,controller-1,compute-1, out of quorum: compute-0
    mgr: controller-1(active), standbys: controller-0
    osd: 2 osds: 2 up, 2 in
    rgw: 1 daemon active

  data:
    pools: 9 pools, 856 pgs
    objects: 2.02 k objects, 2.9 GiB
    usage: 6.0 GiB used, 886 GiB / 892 GiB avail
    pgs: 856 active+clean

io:
client: 448 KiB/s wr, 0 op/s rd, 94 op/s wr

[wrsroot@controller-1 log(keystone_admin)]$ cat /etc/build.info
###
### StarlingX
### Built from master
###

OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20190506T233000Z"

JOB="STX_build_master_master"
<email address hidden>"
BUILD_NUMBER="93"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-05-06 23:30:00 +0000"

Revision history for this message

Ovidiu Poncea (ovidiuponcea) wrote on 2019-05-09:

#4

Problem is that when a node is deleted we only remove the monitor from the DB, it should also be removed from Ceph. Search for ceph_mon_destroy() in sysinv's code. One of the functions from conductor/manager.py is:

    def _remove_ceph_mon(self, host):
        if not StorageBackendConfig.has_backend(
            self.dbapi,
            constants.CINDER_BACKEND_CEPH
        ):
            return

        mon = self.dbapi.ceph_mon_get_by_ihost(host.uuid)
        if mon:
            LOG.info("Deleting ceph monitor for host %s"
                     % str(host.hostname))
            self.dbapi.ceph_mon_destroy(mon[0].uuid)
        else:
            LOG.info("No ceph monitor present for host %s. "
                     "Skipping deleting ceph monitor."
                     % str(host.hostname))

This function is called from _unconfigure_worker_host when a worker host is deleted.

To remove the ceph_mon you should look at cephclient (stx-integ/ceph/python-cephclient/python-cephclient) and see which function can be used to delete the monitor from ceph. The cli equivalent is "ceph mon remove {mon-id}", try it first by hand and see if it works, then write the code to do it (before self.dbapi.ceph_mon_destroy(mon[0].uuid) you need to delete the monitort from ceph)

Btw. One approach to test storage stuff w/o reinstalling the setup each time is to leverage virtualbox snapshots, they have their issues yet they are great as you can easily return to a previous state.

Tingjie Chen (silverhandy) on 2019-05-09

tags:

added: stx.storage

Tingjie Chen (silverhandy) on 2019-05-10

Changed in starlingx:
status:	Triaged → In Progress

Cindy Xie (xxie1) on 2019-05-13

tags:

added: stx.distro.other

Revision history for this message

Tingjie Chen (silverhandy) wrote on 2019-05-14:

#5

Thanks Ovidiu for your precise comments, the command with delete host (compute-0) include _remove_ceph_mon() but not really remove ceph mon, need to call CephWrapper which inherit CephClient.mon_remove() which execute command: ceph mon remove {hostname}.

I have made draft gerrit: https://review.opendev.org/#/c/659094/ and soon verify the change later to fix the issue.

Revision history for this message

Tingjie Chen (silverhandy) wrote on 2019-05-17:

#6

I have verified the gerrit patch: https://review.opendev.org/#/c/659094

After lock and delete ceph-mon of compute-0, and ceph-mon-add compute-1, the result of ceph-mon-list as following:
------------------------------
[wrsroot@controller-0 ~(keystone_admin)]$ ceph -s
  cluster:
    id: 53016669-b4a4-441d-a81a-c70792bd4a8d
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum controller-0,controller-1,compute-1
    mgr: controller-0(active), standbys: controller-1
    osd: 2 osds: 2 up, 2 in
    rgw: 1 daemon active

  data:
    pools: 4 pools, 256 pgs
    objects: 1.16 k objects, 1.1 KiB
    usage: 226 MiB used, 498 GiB / 498 GiB avail
    pgs: 256 active+clean

Cindy Xie (xxie1) on 2019-05-22

tags:

removed: stx.distro.other

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-20: Fix merged to config (master)

#7

Reviewed: https://review.opendev.org/659094
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=70a249595184c3b40f3bebcd780c0e7b327a96be
Submitter: Zuul
Branch: master

commit 70a249595184c3b40f3bebcd780c0e7b327a96be
Author: Chen, Tingjie <email address hidden>
Date: Tue May 14 15:52:52 2019 +0800

Delete ceph-mon when sysinv system delete host

    Resolve the issue which cannot remove ceph mon when lock and delete
    host, since it need to call ceph client interface to emulate: ceph mon
    remove {hostname} to delete.

    Closes-Bug: 1827080
    Change-Id: Ib0fb9550e66aada73459ca60d5c106945c0635eb
    Signed-off-by: Chen, Tingjie <email address hidden>

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

Wendy Mitchell (wmitchellwr) wrote on 2019-08-13:

#8

Reopening this issue as I have tried this on more than one 2+2 system .. I am getting blocked from completing the operation.

I am unable to perform the ceph-mon-add operation to add the compute in step 2 below.

Steps
1. lock and remove the compute that is in the quorum
2. Attempt to provision one of the remaining computes eg. compute-0 as the new ceph monitor by locking compute-0
then attempt the following command:
system ceph-mon-add compute-0

$ system ceph-mon-add compute-0
Node: compute-0 Total target growth size 20 GiB for database (doubled for upgrades), glance, scratch, backup, extension and ceph-mon exceeds growth limit of 1 GiB.

If you look, there is a ceph-mon-lv of size 20
compute-0:~$ sudo lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
  ceph-mon-lv cgts-vg -wi-a----- 20.00g
  docker-lv cgts-vg -wi-ao---- 30.00g
  kubelet-lv cgts-vg -wi-ao---- 10.00g
  log-lv cgts-vg -wi-ao---- <3.91g
  scratch-lv cgts-vg -wi-ao---- <3.91g
  instances_lv nova-local -wi-ao---- <447.13g

Revision history for this message

Wendy Mitchell (wmitchellwr) wrote on 2019-08-13:

#9

Tried the following loads without success
2019-08-09_20-59-00
2019-08-12_20-59-00

Numan Waheed (nwaheed) on 2019-08-14

Changed in starlingx:
status:	Fix Released → Confirmed

Revision history for this message

Tingjie Chen (silverhandy) wrote on 2019-08-19:

#10

This issue is side effect for patch: https://review.opendev.org/661900 for LP: 1827119, sorry for that.

The root cause is in function: check_node_ceph_mon_growth(), when add new ceph monitor, the parameter: cgtsvg_growth_gib should be 0, but in current implementation, it default value is ceph_mon_gib and cannot satisfy the condition with check_node_ceph_mon_growth() and raise error messages which block ceph-mon-add operation.

To resolve it, we should make sure the cgtsvg_growth_gib == 0 when new ceph monitor is created, I will soon raise another fixing patch for it.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-19: Fix proposed to config (master)

#11

Fix proposed to branch: master
Review: https://review.opendev.org/677196

Changed in starlingx:
status:	Confirmed → In Progress

Revision history for this message

Tingjie Chen (silverhandy) wrote on 2019-08-19:

#12

Before the fixing patch when ceph-mon-add:
-------------------------------------
2019-08-19 02:44:47.234 110684 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: controller, ceph_mon_gib: 21, cgtsvg_growth_gib: 0, cgtsvg_max_free_gib: 167
2019-08-19 02:44:47.246 110684 INFO sysinv.api.controllers.v1.utils [-] get_node_cgtsvg_limit cgtsvg_max_free_gib=1
2019-08-19 02:44:47.251 110684 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: compute-1, ceph_mon_gib: 21, cgtsvg_growth_gib: 21, cgtsvg_max_free_gib: 1
2019-08-19 02:44:47.251 110684 WARNING wsme.api [-] Client-side error: Node: compute-1 Total target growth size 21 GiB for database (doubled for upgrades), glance, scratch, backup, extension and ceph-mon exceeds growth limit of 1 GiB.

After the fixing patch when ceph-mon-add:
-------------------------------------
2019-08-19 12:15:53.499 246664 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: controller, ceph_mon_gib: 21, cgtsvg_growth_gib: 0, cgtsvg_max_free_gib: 167
2019-08-19 12:15:53.523 246664 INFO sysinv.api.controllers.v1.utils [-] get_node_cgtsvg_limit cgtsvg_max_free_gib=1
2019-08-19 12:15:53.531 246664 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: compute-1, ceph_mon_gib: 21, cgtsvg_growth_gib: 0, cgtsvg_max_free_gib: 1
2019-08-19 12:15:53.554 246664 INFO sysinv.api.controllers.v1.utils [-] get_node_cgtsvg_limit cgtsvg_max_free_gib=167
2019-08-19 12:15:53.558 246664 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: controller-0, ceph_mon_gib: 21, cgtsvg_growth_gib: 0, cgtsvg_max_free_gib: 167
2019-08-19 12:15:53.575 246664 INFO sysinv.api.controllers.v1.utils [-] get_node_cgtsvg_limit cgtsvg_max_free_gib=167
2019-08-19 12:15:53.581 246664 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: controller-1, ceph_mon_gib: 21, cgtsvg_growth_gib: 0, cgtsvg_max_free_gib: 167

Before the fixing patch when ceph-mon-add:
-------------------------------------
2019-08-19 02:44:47.234 110684 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: controller, ceph_mon_gib: 21, cgtsvg_growth_gib: 0, cgtsvg_max_free_gib: 167
2019-08-19 02:44:47.246 110684 INFO sysinv.api.controllers.v1.utils [-] get_node_cgtsvg_limit cgtsvg_max_free_gib=1
2019-08-19 02:44:47.251 110684 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: compute-1, ceph_mon_gib: 21, cgtsvg_growth_gib: 21, cgtsvg_max_free_gib: 1
2019-08-19 02:44:47.251 110684 WARNING wsme.api [-] Client-side error: Node: compute-1 Total target growth size 21 GiB for database (doubled for upgrades), glance, scratch, backup, extension and ceph-mon exceeds growth limit of 1 GiB.

After the fixing patch when ceph-mon-add:
-------------------------------------
2019-08-19 12:15:53.499 246664 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: controller, ceph_mon_gib: 21, cgtsvg_growth_gib: 0, cgtsvg_max_free_gib: 167
2019-08-19 12:15:53.523 246664 INFO sysinv.api.controllers.v1.utils [-] get_node_cgtsvg_limit cgtsvg_max_free_gib=1
2019-08-19 12:15:53.531 246664 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: compute-1, ceph_mon_gib: 21, cgtsvg_growth_gib: 0, cgtsvg_max_free_gib: 1
2019-08-19 12:15:53.554 246664 INFO sysinv.api.controllers.v1.utils [-] get_node_cgtsvg_limit cgtsvg_max_free_gib=167
2019-08-19 12:15:53.558 246664 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: controller-0, ceph_mon_gib: 21, cgtsvg_growth_gib: 0, cgtsvg_max_free_gib: 167
2019-08-19 12:15:53.575 246664 INFO sysinv.api.controllers.v1.utils [-] get_node_cgtsvg_limit cgtsvg_max_free_gib=167
2019-08-19 12:15:53.581 246664 INFO sysinv.api.controllers.v1.utils [-] check_node_ceph_mon_growth hostname: controller-1, ceph_mon_gib: 21, cgtsvg_growth_gib: 0, cgtsvg_max_free_gib: 167

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-22: Fix merged to config (master)

#13

Reviewed: https://review.opendev.org/677196
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=f1cd5557433bc45235256a4fe1620773bf3973ec
Submitter: Zuul
Branch: master

commit f1cd5557433bc45235256a4fe1620773bf3973ec
Author: Chen, Tingjie <email address hidden>
Date: Mon Aug 19 20:40:29 2019 +0800

Resolve issue of operation ceph-mon-add blocked

    The issue is side effect for patch which resolve ceph-mon-modify issue
    with LP: 1827119, since code rework and the case of ceph-mon-add is
    not fully considered.

    To resolve it, we should make sure cgtsvg_growth_gib == 0 when new ceph
    monitor is added, and ceph_mon should not get by uuid with current host
    since it is not monitor yet, we should get the first ceph monitor from
    monitor list and get the correct value of cgtsvg_growth_gib.

    Closes-Bug: 1827080
    Change-Id: I18af83864d80111c9b499720e01a1f4302e65d40
    Signed-off-by: Chen, Tingjie <email address hidden>

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-08-22:

#14

@Tingjie, please cherrypick the fix to r/stx.2.0 before 2019-08-23

Revision history for this message

John Kung (john-kung) wrote on 2019-08-22:

#15

@Tinjie, Please note that I posted some concerns with https://review.opendev.org/#/c/677196/2/sysinv/sysinv/sysinv/sysinv/api/controllers/v1/utils.py ; in addition to the comment, the method signature is no longer correct as it never references the 'host' parameter passed in.

I recommend that this bug be reopened.

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-08-22:

#16

Bug re-opened as recommended by John Kung.
@Tingjie, Please do NOT cherry-pick to the r/stx.2.0 branch. Since this is a medium priority bug, it will be moved to stx.3.0 on 2019-08-23. So the subsequent fix is only required in master.

Changed in starlingx:
status:	Fix Released → Confirmed

Revision history for this message

Tingjie Chen (silverhandy) wrote on 2019-08-23:

#17

Download full text (4.7 KiB)

1. About ceph_mon_gib, since the data of have to sync between 3 ceph monitors, and when we set ceph-mon-modify in sysinv (although command is on controller-0), we have to check some pre-requirements for all ceph monitors and modify all the configurations.

2. The root cause of the issue: https://bugs.launchpad.net/starlingx/+bug/1827080 , block when ceph-mon-add operation with 2+2 system, it is a long story.

Firstly, another LP: https://bugs.launchpad.net/starlingx/+bug/1827119 is for ceph-mon-modify does not update the ceph-mon partition on worker disk, which only check 2 controllers of ceph monitors, when 2+2, another ceph monitor with compute-0 have no check and when extend the ceph_mon_gib to 40GB, it has exceed the cgtsvg capacity and the compute-0 node turn to failed status after lock - unlock. So I create gerrit: https://review.opendev.org/#/c/661900/ to resolve it, which will check all the ceph monitors, it keep the check flow and made some rework.

But there is another new issue raised, ceph-mon-add compute-1 blocked, actually it is not side effect of patch: https://review.opendev.org/#/c/661900/ because before the patch, system will not check others nodes (compute-0/1, storage-0/1) except controllers(controller-0/1). The patch add the check and let another issue show up: The cgts-vg configuration issue for workers.
In 2+2 deployment, by default we have 3 ceph monitors, controller-0/1 and compute-0, default ceph_mon_gib is 20GB, but compute-1 has pre-defined partition for ceph--mon--lv, it is not mounted because it is not a ceph monitor, but it will cause the check free space function: check_node_ceph_mon_growth() failed when add new ceph monitor, since free space is only 1.156 GB after cgts--vg-ceph--mon—lv partition created and space is reserved in bootstrap phase, but actually we can add ceph monitor for compute-1.
---------------------------------------------
compute-1:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 200G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 500M 0 part /boot
├─sda3 8:3 0 69G 0 part
│ ├─cgts--vg-scratch--lv 253:0 0 3.9G 0 lvm /scratch
│ ├─cgts--vg-log--lv 253:1 0 3.9G 0 lvm /var/log
│ ├─cgts--vg-kubelet--lv 253:2 0 10G 0 lvm /var/lib/kubelet
│ ├─cgts--vg-ceph--mon--lv 253:3 0 20G 0 lvm
│ └─cgts--vg-docker--lv 253:4 0 30G 0 lvm /var/lib/docker
├─sda4 8:4 0 19.5G 0 part /
└─sda5 8:5 0 10G 0 part
└─nova--local-instances_lv 253:5 0 10G 0 lvm /var/lib/nova/instances
sdb 8:16 0 30G 0 disk

// available size for cgts-vg is 1.156 GB because 20GB already created for cgts--vg-ceph--mon--lv
[sysadmin@controller-1 ~(keystone_admin)]$ system host-lvg-list compute-1
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| UUID | LVG Name | State | Access | Total Size (GiB) | Avail S...

1. About ceph_mon_gib, since the data of have to sync between 3 ceph monitors, and when we set ceph-mon-modify in sysinv (although command is on controller-0), we have to check some pre-requirements for all ceph monitors and modify all the configurations.

2. The root cause of the issue: https://bugs.launchpad.net/starlingx/+bug/1827080 , block when ceph-mon-add operation with 2+2 system, it is a long story.

Firstly, another LP: https://bugs.launchpad.net/starlingx/+bug/1827119  is for ceph-mon-modify does not update the ceph-mon partition on worker disk, which only check 2 controllers of ceph monitors, when 2+2, another ceph monitor with compute-0 have no check and when extend the ceph_mon_gib to 40GB, it has exceed the cgtsvg capacity and the compute-0 node turn to failed status after lock - unlock. So I create gerrit: https://review.opendev.org/#/c/661900/ to resolve it, which will check all the ceph monitors, it keep the check flow and made some rework.

But there is another new issue raised, ceph-mon-add compute-1 blocked, actually it is not side effect of patch: https://review.opendev.org/#/c/661900/ because before the patch, system will not check others nodes (compute-0/1, storage-0/1) except controllers(controller-0/1). The patch add the check and let another issue show up: The cgts-vg configuration issue for workers.
In 2+2 deployment, by default we have 3 ceph monitors, controller-0/1 and compute-0, default ceph_mon_gib is 20GB, but compute-1 has pre-defined partition for ceph--mon--lv, it is not mounted because it is not a ceph monitor, but it will cause the check free space function: check_node_ceph_mon_growth() failed when add new ceph monitor, since free space is only 1.156 GB after cgts--vg-ceph--mon—lv partition created  and space is reserved in bootstrap phase, but actually we can add ceph monitor for compute-1.
---------------------------------------------
compute-1:~$ lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                            8:0    0  200G  0 disk
├─sda1                         8:1    0    1M  0 part
├─sda2                         8:2    0  500M  0 part /boot
├─sda3                         8:3    0   69G  0 part
│ ├─cgts--vg-scratch--lv     253:0    0  3.9G  0 lvm  /scratch
│ ├─cgts--vg-log--lv         253:1    0  3.9G  0 lvm  /var/log
│ ├─cgts--vg-kubelet--lv     253:2    0   10G  0 lvm  /var/lib/kubelet
│ ├─cgts--vg-ceph--mon--lv   253:3    0   20G  0 lvm
│ └─cgts--vg-docker--lv      253:4    0   30G  0 lvm  /var/lib/docker
├─sda4                         8:4    0 19.5G  0 part /
└─sda5                         8:5    0   10G  0 part
  └─nova--local-instances_lv 253:5    0   10G  0 lvm  /var/lib/nova/instances
sdb                            8:16   0   30G  0 disk

// available size for cgts-vg is 1.156 GB because 20GB already created for cgts--vg-ceph--mon--lv   
[sysadmin@controller-1 ~(keystone_admin)]$ system host-lvg-list compute-1  
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| UUID                                 | LVG Name   | State       | Access | Total Size (GiB) | Avail Size (GiB) | Current PVs | Current LVs |
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| af8c225d-5154-420a-9092-d55f2a636274 | nova-local | provisioned | wz--n- | 9.996            | 0.0              | 1           | 1           |
| ce0c2f3d-1895-4794-8347-05f3ad8b90a9 | cgts-vg    | provisioned | wz--n- | 58.968           | 1.156            | 1           | 4           |
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+

// the configuration is in puppet, refer line 198, when bootstrap, reserve space for ceph-mon on all worker nodes.
./stx-config/puppet-manifests/src/modules/platform/manifests/ceph.pp 
…
if $::personality == 'worker' and ! $configure_ceph_mon {
    # Reserve space for ceph-mon on all worker nodes.
    include ::platform::filesystem::params
    logical_volume { $mon_lv_name:
      ensure       => present,
      volume_group => $::platform::filesystem::params::vg_name,
      size         => "${mon_lv_size_reserved}G",
    } -> Class['platform::filesystem::docker']
  }

I have check 2+2+2 deployment, storage node have no issue for reserve space with cgts--vg-ceph--mon—lv partition.
So I create this patch: https://review.opendev.org/#/c/677196 to resolve the blocking issue (regarding the reserved space by workers) but it is not optimal solution, I think the best way is to remove the reservation space for workers, but not sure whether there are concerns for current configuration, maybe the space is limited for workers.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-23: Fix proposed to config (master)

#18

Fix proposed to branch: master
Review: https://review.opendev.org/678203

Changed in starlingx:
status:	Confirmed → In Progress

Revision history for this message

Tingjie Chen (silverhandy) wrote on 2019-08-23:

#19

I have raised another patch with one solution: remove the reservation for ceph-mon-lv on worker nodes. https://review.opendev.org/#/c/678203/ and also recover the check_node_ceph_mon_growth() with previous one with no special condition.
I have verified it and it is in line with expectations, please help review it and give comments 

When ceph-mon-add compute-1, before lock & ceph-mon-add:
=============================
[sysadmin@controller-0 ~(keystone_admin)]$ system host-lvg-list compute-1
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| UUID | LVG Name | State | Access | Total Size (GiB) | Avail Size (GiB) | Current PVs | Current LVs |
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| 28bda568-49fb-41f7-a1b1-41006f8252bf | nova-local | provisioned | wz--n- | 9.996 | 0.0 | 1 | 1 |
| c1ba96d9-d223-429e-9aeb-382eca9c4b82 | cgts-vg | provisioned | wz--n- | 58.968 | 21.156 | 1 | 3 |
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+

When ceph-mon-add compute-1, After lock & ceph-mon-add, and unlock:
===================================
[sysadmin@controller-0 ~(keystone_admin)]$ system host-lvg-list compute-1
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| UUID | LVG Name | State | Access | Total Size (GiB) | Avail Size (GiB) | Current PVs | Current LVs |
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| 28bda568-49fb-41f7-a1b1-41006f8252bf | nova-local | provisioned | wz--n- | 9.996 | 0.0 | 1 | 1 |
| c1ba96d9-d223-429e-9aeb-382eca9c4b82 | cgts-vg | provisioned | wz--n- | 58.968 | 0.156 | 1 | 4 |
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+

I have raised another patch with one solution: remove the reservation for ceph-mon-lv on worker nodes. https://review.opendev.org/#/c/678203/ and also recover the check_node_ceph_mon_growth() with previous one with no special condition.
I have verified it and it is in line with expectations, please help review it and give comments 

When ceph-mon-add compute-1, before lock & ceph-mon-add:
=============================
[sysadmin@controller-0 ~(keystone_admin)]$ system host-lvg-list compute-1
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| UUID                                 | LVG Name   | State       | Access | Total Size (GiB) | Avail Size (GiB) | Current PVs | Current LVs |
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| 28bda568-49fb-41f7-a1b1-41006f8252bf | nova-local | provisioned | wz--n- | 9.996            | 0.0              | 1           | 1           |
| c1ba96d9-d223-429e-9aeb-382eca9c4b82 | cgts-vg    | provisioned | wz--n- | 58.968           | 21.156           | 1           | 3           |
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+

When ceph-mon-add compute-1, After lock & ceph-mon-add, and unlock:
===================================
[sysadmin@controller-0 ~(keystone_admin)]$ system host-lvg-list compute-1
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| UUID                                 | LVG Name   | State       | Access | Total Size (GiB) | Avail Size (GiB) | Current PVs | Current LVs |
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+
| 28bda568-49fb-41f7-a1b1-41006f8252bf | nova-local | provisioned | wz--n- | 9.996            | 0.0              | 1           | 1           |
| c1ba96d9-d223-429e-9aeb-382eca9c4b82 | cgts-vg    | provisioned | wz--n- | 58.968           | 0.156            | 1           | 4           |
+--------------------------------------+------------+-------------+--------+------------------+------------------+-------------+-------------+

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-08-23:

#20

As per agreement with the community, moving all unresolved medium priority bugs from stx.2.0 to stx.3.0

tags:

added: stx.3.0
removed: stx.2.0

Revision history for this message

Tingjie Chen (silverhandy) wrote on 2019-09-25:

#21

Since Reserving space for ceph-mon on all worker nodes was a product requirement from Brent, so I have to find another solution to check_node_ceph_mon_growth with consideration of cgts--vg-ceph--mon--lv reservation.
working on resolve patch ...

Revision history for this message

yong hu (yhu6) wrote on 2019-09-25:

#22

@Tingjie to make a draft of solution (or patch) and quickly sync-up with Ovidiu.
This LP is important, so pls take it with priority.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-27: Change abandoned on config (master)

#23

Change abandoned by Tingjie Chen (<email address hidden>) on branch: master
Review: https://review.opendev.org/678203
Reason: Abandon it since there is new solution patch.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-29: Fix proposed to config (master)

#24

Fix proposed to branch: master
Review: https://review.opendev.org/685576

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-11: Fix merged to config (master)

#25

Reviewed: https://review.opendev.org/685576
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=e6c862c99ab74ee40d9d8ae14681bf78b864aaa3
Submitter: Zuul
Branch: master

commit e6c862c99ab74ee40d9d8ae14681bf78b864aaa3
Author: Chen, Tingjie <email address hidden>
Date: Sun Sep 29 10:52:47 2019 +0800

Rework check_node_ceph_mon_growth with ceph-mon-lv reservation

Since there are lvg reserve for ceph-mon 20GB by default in
Worker/Controller, which is the same with default ceph_mon_gib value.

    There are 2 cases:
    1/ Create new ceph-mon for worker(for example), since it has reserved,
    we can get cgtsvg_growth_gib by minus reserve gib: constants.SB_CEPH_MON_GIB
    2. Modify ceph-mon gib, it is the same with previous process, directly
    minus mon.ceph_mon_gib.

    Closes-Bug: 1827080
    Change-Id: Id4890e9f82177398f1fd4a6cda103d806458836a
    Signed-off-by: Chen, Tingjie <email address hidden>

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

Anujeyan Manokeran (anujeyan) wrote on 2019-10-30:

#26

Verified in Build 2019-10-29_20-00-00

Anujeyan Manokeran (anujeyan) on 2019-10-31

tags:

removed: stx.retestneeded

StarlingX

Unable to redefine a ceph monitor

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches