Bug #1804846 “gnocchi pool have many more objects per pg than av...” : Bugs : Ceph Monitor Charm

Revision history for this message

James Page (james-page) wrote on 2018-12-07:

#1

I also see this on a Bionic/Queens deployment:

$ sudo ceph -s
  cluster:
    id: 80f2a4b2-7e13-11e8-a8c7-00163eb36865
    health: HEALTH_WARN
            1 pools have many more objects per pg than average

  services:
    mon: 3 daemons, quorum juju-182b52-6-lxd-0,juju-182b52-4-lxd-0,juju-182b52-5-lxd-0
    mgr: juju-182b52-4-lxd-0(active), standbys: juju-182b52-6-lxd-0, juju-182b52-5-lxd-0
    osd: 12 osds: 12 up, 12 in
    rgw: 1 daemon active

  data:
    pools: 22 pools, 420 pgs
    objects: 3502k objects, 421 GB
    usage: 938 GB used, 10235 GB / 11174 GB avail
    pgs: 419 active+clean
             1 active+clean+scrubbing+deep

io:
client: 6280 B/s rd, 17836 B/s wr, 7 op/s rd, 29 op/s wr

Changed in charm-ceph-mon:
status:	New → Confirmed
importance:	Undecided → Medium

Revision history for this message

James Page (james-page) wrote on 2018-12-07:

#2

rados df for reference:

POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR
.rgw.root 1113 4 0 12 0 0 0 324 216k 4 4096
cinder-ceph 319G 84379 0 253137 0 0 0 189110731 781G 112724504 9124G
default.intent-log 0 0 0 0 0 0 0 0 0 0 0
default.log 0 0 0 0 0 0 0 0 0 0 0
default.rgw 0 0 0 0 0 0 0 0 0 0 0
default.rgw.buckets 0 0 0 0 0 0 0 0 0 0 0
default.rgw.buckets.data 13113M 3299 0 9897 0 0 0 287708 59148M 63648 13285M
default.rgw.buckets.extra 0 0 0 0 0 0 0 0 0 0 0
default.rgw.buckets.index 0 16 0 48 0 0 0 42873 42892k 25458 0
default.rgw.control 0 8 0 24 0 0 0 0 0 0 0
default.rgw.gc 0 0 0 0 0 0 0 0 0 0 0
default.rgw.log 0 207 0 621 0 0 0 30403188 29690M 20258723 0
default.rgw.meta 11618 56 0 168 0 0 0 2529057 1543M 10394 1244k
default.rgw.root 0 0 0 0 0 0 0 0 0 0 0
default.usage 0 0 0 0 0 0 0 0 0 0 0
default.users 0 0 0 0 0 0 0 0 0 0 0
default.users.email 0 0 0 0 0 0 0 0 0 0 0
default.users.swift 0 0 0 0 0 0 0 0 0 0 0
default.users.uid 0 0 0 0 0 0 0 0 0 0 0
glance 47321M 6472 0 19416 0 0 0 429110 400G 154681 202G
gnocchi 44558M 3492359 0 10477077 0 0 0 98827394 68749M 476545467 210G
nova 19 2 0 6 0 0 0 2893482 69727M 9270630 626G

rados df for reference:

POOL_NAME                 USED   OBJECTS CLONES COPIES   MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS    RD     WR_OPS    WR     
.rgw.root                   1113       4      0       12                  0       0        0       324   216k         4   4096 
cinder-ceph                 319G   84379      0   253137                  0       0        0 189110731   781G 112724504  9124G 
default.intent-log             0       0      0        0                  0       0        0         0      0         0      0 
default.log                    0       0      0        0                  0       0        0         0      0         0      0 
default.rgw                    0       0      0        0                  0       0        0         0      0         0      0 
default.rgw.buckets            0       0      0        0                  0       0        0         0      0         0      0 
default.rgw.buckets.data  13113M    3299      0     9897                  0       0        0    287708 59148M     63648 13285M 
default.rgw.buckets.extra      0       0      0        0                  0       0        0         0      0         0      0 
default.rgw.buckets.index      0      16      0       48                  0       0        0     42873 42892k     25458      0 
default.rgw.control            0       8      0       24                  0       0        0         0      0         0      0 
default.rgw.gc                 0       0      0        0                  0       0        0         0      0         0      0 
default.rgw.log                0     207      0      621                  0       0        0  30403188 29690M  20258723      0 
default.rgw.meta           11618      56      0      168                  0       0        0   2529057  1543M     10394  1244k 
default.rgw.root               0       0      0        0                  0       0        0         0      0         0      0 
default.usage                  0       0      0        0                  0       0        0         0      0         0      0 
default.users                  0       0      0        0                  0       0        0         0      0         0      0 
default.users.email            0       0      0        0                  0       0        0         0      0         0      0 
default.users.swift            0       0      0        0                  0       0        0         0      0         0      0 
default.users.uid              0       0      0        0                  0       0        0         0      0         0      0 
glance                    47321M    6472      0    19416                  0       0        0    429110   400G    154681   202G 
gnocchi                   44558M 3492359      0 10477077                  0       0        0  98827394 68749M 476545467   210G 
nova                          19       2      0        6                  0       0        0   2893482 69727M   9270630   626G

Revision history for this message

James Page (james-page) wrote on 2018-12-07:

#3

Gnocchi just appears to generate a large number of objects - 3492359 vs 84379 (cinder-ceph) as the next highest.

Revision history for this message

James Page (james-page) wrote on 2018-12-07:

#4

From upstream ceph ML for this exact issue:

"> How to solve this problem correctly?

As a workaround, I'd just increase the skew option to make the warning go
away.

It seems to me like the underlying problem is that we're looking at object
count vs pg count, but ignoring the object sizes. Unfortunately it's a
bit awkward to fix because we don't have a way to quantify the size of
omap objects via the stats (currently). So for now, just adjust the skew
value enough to make the warning go away!"

Leaving that one Anon but a reliable source nonetheless!

Revision history for this message

James Page (james-page) wrote on 2018-12-07:

#5

So we can either increase "mon pg warn max object skew" or disable the check completely.

Changed in charm-ceph-mon:
milestone:	none → 19.04

James Page (james-page) on 2018-12-07

Changed in charm-ceph-mon:
status:	Confirmed → Triaged

Revision history for this message

James Page (james-page) wrote on 2018-12-11:

#6

I've gone for 'disable this check as its not super useful' rather than increasing the skew threshold value as I think we'll just need to keep increasing it over time for busy deployments.

Revision history for this message

James Page (james-page) wrote on 2018-12-11:

#7

https://review.openstack.org/624398

Changed in charm-ceph-mon:
assignee:	nobody → James Page (james-page)
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-12-12: Fix merged to charm-ceph-mon (master)

#8

Reviewed: https://review.openstack.org/624398
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-mon/commit/?id=33f9bae6c7eba61b931191882f0696db909fb84e
Submitter: Zuul
Branch: master

commit 33f9bae6c7eba61b931191882f0696db909fb84e
Author: James Page <email address hidden>
Date: Tue Dec 11 13:39:54 2018 +0000

Disable object skew warnings

Ceph will issue a HEALTH_WARN in the event that one pool has a
large number of objects compared to other pools in the cluster:

     "Issue a HEALTH_WARN in cluster log if the average object
      number of a certain pool is greater than mon pg warn max
      object skew times the average object number of the whole
      pool."

    For OpenStack deployments, Gnocchi and RADOS gateway can generate
    a large number of small objects compared to Cinder, Glance and
    Nova usage, causing the cluster to go into HEALTH_WARN status.

Disable this check until the skew evaluation also includes the
size of the objects as well as the number.

Change-Id: I83211dbdec4dea8dca5b27a66e26a4431d2a7b77
Closes-Bug: 1804846

Changed in charm-ceph-mon:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-07: Fix proposed to charm-ceph-mon (stable/18.11)

#9

Fix proposed to branch: stable/18.11
Review: https://review.openstack.org/628917

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-07: Fix merged to charm-ceph-mon (stable/18.11)

#10

Reviewed: https://review.openstack.org/628917
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-mon/commit/?id=a0e0db5774c573f50db3c11892b91a91d6eafec9
Submitter: Zuul
Branch: stable/18.11

commit a0e0db5774c573f50db3c11892b91a91d6eafec9
Author: James Page <email address hidden>
Date: Tue Dec 11 13:39:54 2018 +0000

Disable object skew warnings

Ceph will issue a HEALTH_WARN in the event that one pool has a
large number of objects compared to other pools in the cluster:

     "Issue a HEALTH_WARN in cluster log if the average object
      number of a certain pool is greater than mon pg warn max
      object skew times the average object number of the whole
      pool."

    For OpenStack deployments, Gnocchi and RADOS gateway can generate
    a large number of small objects compared to Cinder, Glance and
    Nova usage, causing the cluster to go into HEALTH_WARN status.

Disable this check until the skew evaluation also includes the
size of the objects as well as the number.

    Change-Id: I83211dbdec4dea8dca5b27a66e26a4431d2a7b77
    Closes-Bug: 1804846
    (cherry picked from commit 33f9bae6c7eba61b931191882f0696db909fb84e)

David Ames (thedac) on 2019-04-17

Changed in charm-ceph-mon:
status:	Fix Committed → Fix Released

Ceph Monitor Charm

gnocchi pool have many more objects per pg than average

Bug Description

Other bug subscribers

Remote bug watches