Ceph health warning occurs on larger systems due to the default settings not being large enough

Bug #1899128 reported by Elena Taivan
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Elena Taivan

Bug Description

Brief Description
-----------------
Ceph health warning for MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
Severity
--------
Major

Steps to Reproduce
------------------
Create 100 volumes

Expected Behavior
------------------
No ceph alarm

Actual Behavior
----------------
ceph health warning alarm

Reproducibility
---------------
100%
System Configuration
--------------------
2+10 (ipv4)

Branch/Pull Time/Commit
-----------------------
This is an issue on the latest STX master load
---------
N/A
Timestamp/Logs
--------------
ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 10.91019 root storage-tier
-2 10.91019 chassis group-0
-4 5.45509 host controller-0
1 hdd 5.45509 osd.1 up 1.00000 1.00000
-3 5.45509 host controller-1
0 hdd 5.45509 osd.0 up 1.00000 1.00000
[sysadmin@controller-0 ~(keystone_admin)]$ ceph mon stat
e2: 3 mons at

{compute-0=192.168.204.39:6789/0,controller-0=192.168.204.3:6789/0,controller-1=192.168.204.4:6789/0}
, election epoch 38, leader 0 controller-0, quorum 0,1,2 controller-0,controller-1,compute-0
[sysadmin@controller-0 ~(keystone_admin)]$ ceph mon dump
dumped monmap epoch 2
epoch 2
fsid e62b1285-6696-46b4-b498-35f94c469776
last_changed 2020-08-24 18:16:39.142192
created 2020-08-24 16:49:53.561630
0: 192.168.204.3:6789/0 mon.controller-0
1: 192.168.204.4:6789/0 mon.controller-1
2: 192.168.204.39:6789/0 mon.compute-0
[sysadmin@controller-0 ~(keystone_admin)]$ ceph health detail
HEALTH_WARN 1 pools have many more objects per pg than average
MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
pool cinder-volumes objects per pg (307) is more than 16.1579 times cluster average (19)
[sysadmin@controller-0 ~(keystone_admin)]$ ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
11 TiB 11 TiB 79 GiB 0.70
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
kube-rbd 1 31 GiB 0.58 5.1 TiB 9110
images 2 1.5 GiB 0.03 5.1 TiB 199
cinder.backups 3 0 B 0 5.1 TiB 0
cinder-volumes 4 8.0 GiB 0.15 5.1 TiB 2461
ephemeral 5 0 B 0 5.1 TiB 0
[sysadmin@controller-0 ~(keystone_admin)]$ ceph osd dump|grep pool
pool 1 'kube-rbd' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 18 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 2 'images' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 70 flags hashpspool,selfmanaged_snaps stripe_width 0 application glance-image
pool 3 'cinder.backups' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 29 flags hashpspool stripe_width 0 application cinder-volumes
pool 4 'cinder-volumes' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 66 flags hashpspool,selfmanaged_snaps stripe_width 0 application cinder-volumes
pool 5 'ephemeral' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 75 flags hashpspool stripe_width 0 application nova-ephemeral

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config (master)

Change abandoned by Elena Taivan (<email address hidden>) on branch: master
Review: https://review.opendev.org/757026

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Elena Taivan (etaivan)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / medium priority - issue related to openstack storage config

Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.5.0 stx.distro.openstack stx.storage
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-armada-app (master)

Reviewed: https://review.opendev.org/757015
Committed: https://git.openstack.org/cgit/starlingx/openstack-armada-app/commit/?id=a643665af4d6097a9698365657b59e51b601463c
Submitter: Zuul
Branch: master

commit a643665af4d6097a9698365657b59e51b601463c
Author: Elena Taivan <email address hidden>
Date: Fri Oct 9 07:35:03 2020 +0000

    Change default pg_num values for ceph pools:
        - cinder-volumes
        - cinder.backups
        - images
        - ephemeral

    Pg_num values were increased to avoid ceph health warning
    that occurs on larger systems due to the default
    pg_num settings not being large enough.

    Change-Id: I23feffe613c37b12dff51c73e7ced9a9c7663089
    Closes-bug: 1899128
    Signed-off-by: Elena Taivan <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.