ceph-mon-modify does not update the ceph-mon partition on worker node

Bug #1827119 reported by Maria Yousaf
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Tingjie Chen

Bug Description

Brief Description
-----------------
If the user runs system ceph-mon-modify to enlarge the ceph-mon partition, only the controllers go config out-of-date. The user must lock/unlock the controllers to clear the alarms. If the user checks the filesystem, they see that ceph-mon is updated to the new value for the controllers, as expected. For the worker node, there is no indication to lock/unlock the node and ceph-mon stays at the default value 20GB. This disagrees from the output of system ceph-mon-list. Even if the user did a lock/unlock on the worker node, the filesystem size will stay at 20GB. If this is the expected behaviour for the worker node, the output of system ceph-mon-list needs to be corrected.

Severity
--------
Minor

Steps to Reproduce
------------------
1. Run 'system ceph-mon-modify controller-0 ceph_mon_gib=40
2. Both controllers go config-out-of-date
3. The user lock/unlocks each controller
4. Run 'ceph df' and observe that ceph-mon increases to the new value, i.e. 40GB. Run this on all nodes.
5. The controllers are updated but the compute stays at 20GB. This disagrees with what is reported by system ceph-mon-list, which reports the compute ceph-mon size as now being 40GB:

[wrsroot@controller-1 ~(keystone_admin)]$ system ceph-mon-list
+--------------------------------------+-------+--------------+------------+------+
| uuid | ceph_ | hostname | state | task |
| | mon_g | | | |
| | ib | | | |
+--------------------------------------+-------+--------------+------------+------+
| 0fee88b3-56f0-4f3e-bf99-2423e81eda3b | 40 | controller-0 | configured | None |
| 3e01735e-d80e-4ee5-b631-c4adb403710c | 40 | compute-1 | configured | None |
| c8667a02-c201-43b2-be61-7ed0e0ad8af8 | 40 | controller-1 | configured | None |
+--------------------------------------+-------+--------------+------------+------+

Expected Behavior
------------------
Ceph-mon is updated on filesystems of all affected nodes.

Actual Behavior
----------------
Ceph-mon is updated on controllers only.

Reproducibility
---------------
Tried once.

System Configuration
--------------------
Standard system (2 controllers + 2 computes)

Branch/Pull Time/Commit
-----------------------
master build: 20190427T013000Z

Last Pass
---------
I don't believe this has been run on StarlingX before.

Timestamp/Logs
--------------
N/A

Test Activity
-------------
Regression

Revision history for this message
Frank Miller (sensfan22) wrote :

Marking stx.2.0 gating as ceph functionality is required for all StarlingX configurations.

Assigning to Cindy and request assistance to identify a prime to investigate this issue.

Changed in starlingx:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Cindy Xie (xxie1)
tags: added: stx.2.0 stx.retestneeded
Revision history for this message
Cindy Xie (xxie1) wrote :

Maria, can you retest using the latest ISO (after 5/3 Ceph upgrade)?

Changed in starlingx:
assignee: Cindy Xie (xxie1) → Tingjie Chen (silverhandy)
Revision history for this message
Maria Yousaf (myousaf) wrote :

This is still a problem:

[wrsroot@controller-0 ~(keystone_admin)]$ df -H | grep /var/lib/ceph/mon
/dev/mapper/cgts--vg-ceph--mon--lv 43G 115M 40G 1% /var/lib/ceph/mon

controller-1:~$ df -H | grep /var/lib/ceph/mon
/dev/mapper/cgts--vg-ceph--mon--lv 43G 115M 40G 1% /var/lib/ceph/mon

compute-1:~$ df -H | grep /var/lib/ceph/mon
/dev/mapper/cgts--vg-ceph--mon--lv 22G 111M 20G 1% /var/lib/ceph/mon

###
### StarlingX
### Built from master
###

OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20190506T233000Z"

JOB="STX_build_master_master"
<email address hidden>"
BUILD_NUMBER="93"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-05-06 23:30:00 +0000"

[wrsroot@controller-0 ~(keystone_admin)]$ system ceph-mon-list
+--------------------------------------+-------+--------------+------------+------+
| uuid | ceph_ | hostname | state | task |
| | mon_g | | | |
| | ib | | | |
+--------------------------------------+-------+--------------+------------+------+
| 7351acaa-837d-4456-9144-ad64ed977d63 | 40 | controller-1 | configured | None |
| bf488244-a765-42e6-9c27-90cd6012d892 | 40 | compute-1 | configured | None |
| e6415d2e-0289-419f-81a8-94bfcbb443aa | 40 | controller-0 | configured | None |
+--------------------------------------+-------+--------------+------------+------+

tags: added: stx.storage
Revision history for this message
Ovidiu Poncea (ovidiuponcea) wrote :

We have two approaches here, depending on the path we want to take:

1. Drop resize functionality, ceph-mon uses less than 1GB of data (more like 500MB) and we allocate 20GB, Ceph docs recomands 10GB just in case. I'll throw a question on Ceph's mailing list as there is no clear specification regarding ceph-mon limits.
2. Since all ceph-mon partitions have to be equal on all nods we implement resize for worker nodes and for storage nodes (none exists today). There are two solutions here:
  A. Simplified but inconsistent solution: extend ceph-mon also for computes and storages. Problem is if we don't have enough space in cgts-vg we can't extend it either on storage. The good part is that we allocate the entire disk space to it, therefore there should be enough space as long as we put a limit to it (if memory serves me, current limit is set at 40GB).
Another problem is we manage all other LVs in cgts-vg through 'system controllerfs-*' commands, see B below.
  B. Generic, longterm and consistent solution: Since ceph mon data resides on a logical volume in cgts-vg and all the other LVs in here are managed through 'system controller-fs-*' commands it makes sense to also manage ceph-mon using controllerfs cmds. The problem here is that this mechanism doesn't exist for worker not storage nodes. Therefore to make this work we will need a small story that:
- renames 'system controllerfs-*' commands to 'system nodefs-*'
- enables rootfs modifications for all node types
- removes existing ceph-mon-gib funtionality.

I would go for #1 but we need confirmation on Ceph mailing lists that there is no risk to go above the 20GB space we currently allocate to ceph-mon esle 2A is the simplest solution to implement now, given the fact that in following releases we plan to also containerize Ceph thus we will need to revisit this mechanism anyway.

Changed in starlingx:
status: Triaged → In Progress
Cindy Xie (xxie1)
tags: added: stx.distro.other
Revision history for this message
Tingjie Chen (silverhandy) wrote :

Thanks Ovidiu for your suggestions,

for #1 solution, to starlingx users, it is a problem if ceph-mon data reach the limits without warning message, currently for ceph cluster commands, there are setttings of mon_data_size_warn/mon_data_avail_warn, and I think is needed for a command to setup familiar threshold or limit mechanism.

for #2A solutions, some ceph deployment allocate entire disk to share data with mon and others (osd, logs, etc), and set a warning threshold percentage for ceph-mon data, it simple to implement but has cross influence impact, if system logs crazy increase and fill up disk space, ceph-mon data has no way to expand even it is not reach threashold size or percentage.

#2B solution is a ideal solution but consider the complex source change with stability risk, I think we have to make a compromise, I tend to #2A currently, yes next containerized ceph upgrade need to reconsider a longterm solution and maybe have other concerns.

Revision history for this message
Ovidiu Poncea (ovidiuponcea) wrote :

Re #1. We got 3 answers on the ceph mailing list:

1. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034672.html
2. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034674.html
3. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034709.html

Summary is:
1. In normal conditions size is small (~1.5GB)
2. Ceph-mon data increases if there are OSDs misbehaving (i.e. OSDs down, nodes down) for long time as cluster needs to keep previous epochs for replays once the misbehaving OSD rejoins the cluster.
3. Once reply is done, previous data is cleaned and space is released back (so no leackage, which is good!)
4. There has to be enough space for replays, recommendation is ~64 GB but it depends on the cluster size (note that our clusters are quite small, 4-8 storage nodes each with 4 OSDs is a small cluster from Ceph's perspective). As you said Tingjie, better have a warning and allow user to take action (allow him to increase mon partition size & fix error condition).

Conclusions is that we still need the resize => #1 is out of the way. And the initial 20GB is ok but has to be resizable.

Revision history for this message
Ovidiu Poncea (ovidiuponcea) wrote :

So, indeed Tingjie, #2B is too much. #1 is not a good idea (see prev comment). #2A is the way to go coupled with setting "mon data avail warn" to 10 (i.e. alarm will be raised when ceph mon disk usage reaches 90%) in $MY_REPO/stx/stx-integ/ceph/ceph/files/ceph.conf.

In our case ceph-mon resides on a separate partition (i.e. lvm logical volume), so no need to worry about other processes filling it (no logging nor OSD data is wirtten there, our logging is done to /var/log/ceph and OSDs keep data on the OSD disk or journal disk) => 90% is a good static limit for the threshold (by default is 70% but that's too loose).

Also, it's a good idea to disable "mon data size warn" (i.e. setting it to 0 should do it) else we would have to update it each time ceph-mon is resized or alarm will be raised at 15GB.

Cindy Xie (xxie1)
tags: removed: stx.distro.other
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/660889

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config (master)

Change abandoned by Tingjie Chen (<email address hidden>) on branch: master
Review: https://review.opendev.org/660889
Reason: Abandon it since MON data need to sync on 3 MONs, change controllers without worker/storage does not make sense.

Revision history for this message
Tingjie Chen (silverhandy) wrote :

Some update for the issue...

Command: system ceph-mon-modify <controller> ceph_mon_gib=<number>
since it support controllers only in literal meaning, but after add
storage/worker support, there are challenges.

According to current implementation, it will update controllers and
worker/storage node to lvextend cgts-vg, but only check free size on
controllerfs, we cannot get the cgts-vg information of worker/storage,
so there will have 2 issues:
1. Update worker/storage (if ceph mon) also besides controllers which
not according with command literal meaning for controllers only, refer
LP: 1827119
2. Have no way to check fs of worker/storage, if there is no enough free
space to extend the vgts-vg but when lock/unlock, puppet script execute
/usr/sbin/lvextend -L xxxk /dev/cgts-vg/ceph-mon-lv and return failed,
worker/storage node status is fail, refer LP: 1828262

So it is not simple to extend ceph-mon-modify ceph_mon_gib on compute and storage nodes.

One way we resolve the issue is update controllers only by following
literal meaning of the command, since there is no way to check the free
space of vgts-vg, or the implementation to check on worker/storage is
complicated. And the problem is ceph_mon_gib cannot config on
storage/worker node, but ceph mon need sync and mon size equality is required.
Another way is make some check mechanism before lvextend in worker/storage node,
when node reboot, if puppet check failed will give alarm information for users to adjust.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/661900

Revision history for this message
Tingjie Chen (silverhandy) wrote :

I have submit gerrit patch: https://review.opendev.org/661900, which following the #2A solution.
The way we resolve the issue is extend the check mechanism to controllers
and worker/storage nodes, if one node check cgtsvg limit failed, the
command will return error message and no action is implemented.

Revision history for this message
Kristine Bujold (kbujold) wrote :

Something to note is that ceph-mon is also configurable from Horizon under "Admin/Platform/System Configuration/Controller Filesystem"

Revision history for this message
Kristine Bujold (kbujold) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/667044

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config (master)

Change abandoned by Tingjie Chen (<email address hidden>) on branch: master
Review: https://review.opendev.org/667044
Reason: Abandon it since it is debug only

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/661900
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=39cb9d0c0b66a4f4920cd26ecd27bdf196d9b260
Submitter: Zuul
Branch: master

commit 39cb9d0c0b66a4f4920cd26ecd27bdf196d9b260
Author: Chen, Tingjie <email address hidden>
Date: Fri Jul 12 21:49:42 2019 +0800

    Refine command ceph-mon-modify for all nodes support

    Command: system ceph-mon-modify <controller> ceph_mon_gib=<number>
    since it support controllers only in literal meaning, but after add
    storage/worker support, there are challenges.

    According to current implementation, it will update controllers and
    worker/storage node to lvextend cgts-vg, but only check free size on
    controllerfs, we cannot get the cgts-vg information of worker/storage,
    so there will have 2 issues:
    1. Update worker/storage (if ceph mon) also besides controllers which
    not according with command literal meaning for controllers only, refer
    LP: 1827119
    2. Have no way to check fs of worker/storage, if there is no enough free
    space to extend the vgts-vg but when lock/unlock, puppet script execute
    /usr/sbin/lvextend -L xxxk /dev/cgts-vg/ceph-mon-lv and return failed,
    worker/storage node status is fail, refer LP: 1828262

    The way we resolve the issue is extend the check mechanism to
    controllers and worker/storage nodes, if one node check cgtsvg limit
    failed, the command will return error message and no action is
    implemented.

    Closes-Bug: 1827119
    Change-Id: I106581bde1ebbe56cd34e35fa734435bd0c1a268
    Signed-off-by: Chen, Tingjie <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :
Download full text (7.2 KiB)

20190807T053000Z
2 controller +2 worker node system

The behavior is still not quite expected in terms of compute-0 alarms and system ceph-mon-list "state" output
ie. see ****--- in step 6 and step 7 below

1. Check the ceph mon size prior to making any changes
[sysadmin@controller-0 ~(keystone_admin)]$ df -H | grep /var/lib/ceph/mon
/dev/mapper/cgts--vg-ceph--mon--lv 22G 108M 20G 1% /var/lib/ceph/mon
controller-1:~$ df -H | grep /var/lib/ceph/mon
/dev/mapper/cgts--vg-ceph--mon--lv 22G 108M 20G 1% /var/lib/ceph/mon
compute-0:~$ df -H | grep /var/lib/ceph/mon
/dev/mapper/cgts--vg-ceph--mon--lv 22G 108M 20G 1% /var/lib/ceph/mon

2. Attempt changes
$ system ceph-mon-modify controller-0 ceph_mon_gib=40
Node: compute-0 Total target growth size 20 GiB for database (doubled for upgrades), glance, scratch, backup, extension and ceph-mon exceeds growth limit of 1 GiB.
$ system ceph-mon-modify controller-0 ceph_mon_gib=10
ceph_mon_gib = 10. Value must be between 21 and 40.
[sysadmin@controller-0 ~(keystone_admin)]$ system ceph-mon-modify controller-0 ceph_mon_gib=25
Node: compute-0 Total target growth size 5 GiB for database (doubled for upgrades), glance, scratch, backup, extension and ceph-mon exceeds growth limit of 1 GiB.

3. command executed here to make the change:
$ system ceph-mon-modify controller-0 ceph_mon_gib=21
+--------------------------------------+-------+--------------+-------------+--------------------------------------------------------------+
| uuid | ceph_ | hostname | state | task |
| | mon_g | | | |
| | ib | | | |
+--------------------------------------+-------+--------------+-------------+--------------------------------------------------------------+
| 4df4455f-ff14-4433-9d24-645cc920e673 | 21 | compute-0 | configuring | {u'controller-1': 'configuring', u'controller-0': |
| | | | | 'configured'} |
| | | | | |
| da081bb3-36a5-4122-ab13-619d8888c299 | 21 | controller-1 | configured | None |
| ec58c58d-ded3-4169-9796-1047f1949aa4 | 21 | controller-0 | configured | None |
+--------------------------------------+-------+--------------+-------------+--------------------------------------------------------------+

***4. Config out-of-date alarms are triggered (but not a compute-0 alarm???)
controller-0 shows Config out-of-date

  250.001 controller-0 Configuration is out-of-date. host=controller-0 major 2019-08-08T16:26:07
 250.001 controller-1 Configuration is out-of-date. host=con...

Read more...

Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :

This LP should be reopened.

Numan Waheed (nwaheed)
Changed in starlingx:
status: Fix Released → Confirmed
Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :
Revision history for this message
Tingjie Chen (silverhandy) wrote :

Hi Wendy, I suppose your key point are:

1/ No alarm: out-of-config in compute-0 to indicate.
2/ ceph-mon-list, compute-0 status in configuring, and cannot recover.

for the second point, I cannot reproduce the configuring status in compute-0, may I ask what is the reproduce frequency?

--------------------------------

After host-lock, host-unlock, swact in controller-0, and controller-1, and host lock/unlock in compute-0.

[sysadmin@controller-0 ~(keystone_admin)]$ system ceph-mon-list
+--------------------------------------+--------------+--------------+------------+------+
| uuid | ceph_mon_gib | hostname | state | task |
+--------------------------------------+--------------+--------------+------------+------+
| 38742a06-ea02-402f-b95d-2b286969dd53 | 21 | compute-0 | configured | None |
| ccebf7db-7607-4271-a24c-b3dceb4a482c | 21 | controller-1 | configured | None |
| dac91d09-b221-492c-a510-faba1ba6dcf4 | 21 | controller-0 | configured | None |
+--------------------------------------+--------------+--------------+------------+------+

[sysadmin@controller-0 ~(keystone_admin)]$ df -H | grep /var/lib/ceph/mon
/dev/mapper/cgts--vg-ceph--mon--lv 23G 108M 21G 1% /var/lib/ceph/mon

controller-1:~$ df -H | grep /var/lib/ceph/mon
/dev/mapper/cgts--vg-ceph--mon--lv 23G 108M 21G 1% /var/lib/ceph/mon

compute-0:~$ df -H | grep /var/lib/ceph/mon
/dev/mapper/cgts--vg-ceph--mon--lv 23G 108M 21G 1% /var/lib/ceph/mon

Revision history for this message
Tingjie Chen (silverhandy) wrote :
Download full text (3.3 KiB)

I have tried 2 times for the 2 points,
1/ The alarm list for compute-0 missing, this issue is confirmed, and I will raise another patch to fix.
2/ compute-0 status in configuring, this still cannot reproduce.

[sysadmin@controller-0 ~(keystone_admin)]$ fm alarm-list
+----------+-----------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------+----------+----------------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+-----------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------+----------+----------------------------+
| 250.001 | controller-1 Configuration is out-of-date. | host=controller-1 | major | 2019-08-19T11:15:03.902690 |
| 250.001 | controller-0 Configuration is out-of-date. | host=controller-0 | major | 2019-08-19T11:15:03.798282 |
| 800.010 | Potential data loss. No available OSDs in storage replication group group-0: no OSDs | cluster=a4c1e115-f27a-4c83-9c4d-3bc500e5f3e5.peergroup=group-0 | critical | 2019-08-19T11:10:10.389689 |
| 400.005 | Communication failure detected with peer over port ens6 on host controller-1 | host=controller-1.network=oam | major | 2019-08-19T08:41:12.293739 |
| 800.001 | Storage Alarm Condition: HEALTH_WARN [PGs are degraded/stuck or undersized]. Please check 'ceph -s' for more details. | cluster=a4c1e115-f27a-4c83-9c4d-3bc500e5f3e5 | warning | 2019-08-19T08:39:20.555559 |
| 400.005 | Communication failure detected with peer over port ens6 on host controller-0 | host=controller-0.network=oam | major | 2019-08-19T08:19:52.216585 |
+----------+-----------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------+----------+----------------------------+

after swact and ceph-mon-add compute-1:

[sysadmin@controller-1 ~(keystone_admin)]$ system ceph-mon-list
+--------------------------------------+--------------+--------------+------------+------+
| uuid | ceph_mon_gib | hostname | state | task |
+--------------------------------------+--------------+--------------+------------+------+
| b0b97fbe-b876-4656-8719-4e10d90b37cb | 21 | controller-1 | configured | None |
| d51a32dc-57a6-4605-8573-89c452507c35 | 21 | controller-0 | configured |...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/677424

Changed in starlingx:
status: Confirmed → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

As per agreement with the community, moving all unresolved medium priority bugs from stx.2.0 to stx.3.0

tags: added: stx.3.0
removed: stx.2.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/677424
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=abebb638603cb3297ae133d4e388d034d7d3eff8
Submitter: Zuul
Branch: master

commit abebb638603cb3297ae133d4e388d034d7d3eff8
Author: Chen, Tingjie <email address hidden>
Date: Tue Aug 20 15:50:41 2019 +0800

    Add alarm for worker in ceph-mon

    In 2+2 deployment, ceph-mon has 3 replication with controller-0/1 and
    compute-0, when settings such as ceph_mon_gib changed, the alarm-list
    lack of compute node dedicated message to indicate.

    The change list all ceph monitors in controller/storage/worker and
    alarm will produced on dedicated node with storage config change.

    Partial-bug: 1827119
    Change-Id: I9d9624b1a82e52d800ab9d594a180641c854a039
    Signed-off-by: Chen, Tingjie <email address hidden>

Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Tingjie, The commit above has "Partial-bug" instead of "Closes-Bug". Are you expecting to make additional code changes for this bug? Is this only a partial fix?

Revision history for this message
Tingjie Chen (silverhandy) wrote :

@Ghada, In my opinion, there are 2 points raised by Maria,
1/ The alarm list for compute-0 missing, this issue is confirmed, and I will raise another patch to fix.
2/ compute-0 status in configuring, this still cannot reproduce.

The patch: https://review.opendev.org/#/c/677424/ is for resolve point 1.
but the second point I cannot reproduced, and need confirm from reporter, thanks.

Revision history for this message
yong hu (yhu6) wrote :

https://review.opendev.org/#/c/677424/ was merged, but @Tingjie to further check with @Wendy about another aspect.

Revision history for this message
Ovidiu Poncea (ovidiuponcea) wrote :

The 'configuring' state should clear itself on lock/unlock even if there are issues with ceph-mon during initial configuration. Therefore, I also recommend a retest for 2/

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as Fix Released based on Ovidiu's recommendation.
If an issue is encountered during retest, this bug can be re-opened or a new open can be created.

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
John Kruszewski (jiggernaut) wrote :
Download full text (5.6 KiB)

# RETEST STATUS
PASSED

# CONFIGURATION
2 + 2 Controller Storage Config

# LOAD TESTED
BUILD_ID="2019-10-09_20-00-00"

# TESTING

1. Verify ceph-mon prior to changing
 [sysadmin@controller-0 ~(keystone_admin)]$ system ceph-mon-list
 +--------------------------------------+-------+--------------+------------+------+
 | uuid | ceph_ | hostname | state | task |
 | | mon_g | | | |
 | | ib | | | |
 +--------------------------------------+-------+--------------+------------+------+
 | d225b915-8adf-4b24-961d-df7855e2cebf | 20 | compute-0 | configured | None |
 | d850eb6e-c880-4188-bc78-43bd162a0ff6 | 20 | controller-0 | configured | None |
 | df67b4c3-3962-4767-a4fa-dcdeb0f8e53d | 20 | controller-1 | configured | None |
 +--------------------------------------+-------+--------------+------------+------+
 [sysadmin@controller-0 ~(keystone_admin)]$ df -H | grep /var/lib/ceph/mon
 /dev/mapper/cgts--vg-ceph--mon--lv 22G 105M 20G 1% /var/lib/ceph/mon

 controller-1:~$ df -H | grep /var/lib/ceph/mon
 /dev/mapper/cgts--vg-ceph--mon--lv 22G 105M 20G 1% /var/lib/ceph/mon

 compute-0:~$ df -H | grep /var/lib/ceph/mon
 /dev/mapper/cgts--vg-ceph--mon--lv 22G 107M 20G 1% /var/lib/ceph/mon

2. Increase size of ceph_mon
 [sysadmin@controller-0 ~(keystone_admin)]$ system ceph-mon-modify controller-0 ceph_mon_gib=21
 +--------------------------------------+-------+--------------+------------+------+
 | uuid | ceph_ | hostname | state | task |
 | | mon_g | | | |
 | | ib | | | |
 +--------------------------------------+-------+--------------+------------+------+
 | d225b915-8adf-4b24-961d-df7855e2cebf | 21 | compute-0 | configured | None |
 | d850eb6e-c880-4188-bc78-43bd162a0ff6 | 21 | controller-0 | configured | None |
 | df67b4c3-3962-4767-a4fa-dcdeb0f8e53d | 21 | controller-1 | configured | None |
 +--------------------------------------+-------+--------------+------------+------+

3. After lock/unlock of controllers, compute still reports config out-of-date
 [sysadmin@controller-1 ~(keystone_admin)]$ fm alarm-list
 +----------+-------------------------------------+----------------+----------+-------------------+
 | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
 +----------+-------------------------------------+----------------+----------+-------------------+
 | 250.001 | compute-0 Configuration is out-of-date. | host=compute-0 | major | 2019-10-11T14:45: |
 | | | | | 05.903134 |
 | | | | | |
 +----------+-------------------------------------+----------------+----------+-------------------+
 [sysadmin@controller-1 ~(keystone_admin)]$ system ceph...

Read more...

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.