Ceph health is too many PGs per OSD (320 > max 300) after trying to delete ceph osds

Bug #1539555 reported by Andrey Sledzinskiy
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
MOS Ceph
8.0.x
Invalid
High
Egor Kotko
Mitaka
Invalid
High
MOS Ceph

Bug Description

Steps to reproduce:
1. Create and deploy next cluster - Neutron Vlan, ceph for volumes/images/ephemeral/rados gw for objects, 3 controller, 3 ceph nodes, 1 compute node
2. After deployment add one ceph node and re-deploy (not necessary to reproduce)
3. After re-deploy start preparing ceph node to be deleted (using that guide - https://docs.mirantis.com/openstack/fuel/fuel-7.0/operations.html#how-to-safely-remove-a-ceph-osd-node)
4. Execute next commands (on node-2 in my cases):
- ceph osd out 1
- ceph osd out 3
5. Start waiting for 'ceph -s' to show OK status

Actual result - after an hour of waiting (test cluster without any data on ceph nodes) status is:

ceph -s
    cluster c3c93807-159d-46b4-93bf-285f03414733
     health HEALTH_WARN
            too many PGs per OSD (320 > max 300)
     monmap e3: 3 mons at {node-3=10.109.1.8:6789/0,node-4=10.109.1.6:6789/0,node-5=10.109.1.9:6789/0}
            election epoch 4, quorum 0,1,2 node-4,node-3,node-5
     osdmap e65: 8 osds: 8 up, 6 in
      pgmap v194: 640 pgs, 10 pools, 12977 kB data, 51 objects
            12566 MB used, 284 GB / 296 GB avail
                 640 active+clean

fuel iso - 478
logs are attached

Tags: area-ceph
Revision history for this message
Ivan Ponomarev (ivanzipfer) wrote :

Please provide Fuel ISO version

Changed in fuel:
status: New → Incomplete
Vasily Gorin (vgorin)
Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
description: updated
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

After executing ceph osd out $id on ceph node it's health is constantly
ceph -s
    cluster c3c93807-159d-46b4-93bf-285f03414733
     health HEALTH_WARN
            too many PGs per OSD (320 > max 300)
     monmap e3: 3 mons at {node-3=10.109.1.8:6789/0,node-4=10.109.1.6:6789/0,node-5=10.109.1.9:6789/0}
            election epoch 4, quorum 0,1,2 node-4,node-3,node-5
     osdmap e65: 8 osds: 8 up, 6 in
      pgmap v194: 640 pgs, 10 pools, 12977 kB data, 51 objects
            12566 MB used, 284 GB / 296 GB avail
                 640 active+clean

so seems it's not qa issue

description: updated
Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → MOS Ceph (mos-ceph)
summary: - add_delete_ceph test timed out waiting ceph health to be ok
+ Ceph health is too many PGs per OSD (320 > max 300) after trying to
+ delete ceph osds
tags: removed: area-qa
tags: added: area-ceph
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

Not a bug.

The number of placement groups per OSD increases after removing OSDs (the placement
groups which have been served by the removed OSDs get distributed among the remaining OSDs).
ceph warns that having that much placement groups per an OSD *might* be suboptimal.

Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :

@asledzinskiy

As I remember, sometimes I have seen the state HEALTH_WARN after 2 step(before any delete steps).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-qa (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/274689

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-qa (stable/8.0)

Related fix proposed to branch: stable/8.0
Review: https://review.openstack.org/275417

Revision history for this message
Egor Kotko (ykotko) wrote :

Reproduced on RC2 ISO #570
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "570"
  build_id: "570"
  fuel-nailgun_sha: "558ca91a854cf29e395940c232911ffb851899c1"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "658be72c4b42d3e1436b86ac4567ab914bfb451b"
  fuel-nailgun-agent_sha: "b2bb466fd5bd92da614cdbd819d6999c510ebfb1"
  astute_sha: "b81577a5b7857c4be8748492bae1dec2fa89b446"
  fuel-library_sha: "c2a335b5b725f1b994f78d4c78723d29fa44685a"
  fuel-ostf_sha: "3bc76a63a9e7d195ff34eadc29552f4235fa6c52"
  fuel-mirror_sha: "fb45b80d7bee5899d931f926e5c9512e2b442749"
  fuelmenu_sha: "78ffc73065a9674b707c081d128cb7eea611474f"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "a43cf96cd9532f10794dce736350bf5bed350e9d"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "d605bcbabf315382d56d0ce8143458be67c53434"

Revision history for this message
Egor Kotko (ykotko) wrote :
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Egor, have you seen Alexei's comment on this? ( https://bugs.launchpad.net/fuel/+bug/1539555/comments/4 )

Why do you think this should be re-opened? What exactly breaks?

Revision history for this message
Egor Kotko (ykotko) wrote :

Currently we got the warning because of the max value of placement groups per OSD. Can we increase the max number ?

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

So looks like 300 is the default maximum PGs value (according to https://access.redhat.com/documentation/en/red-hat-ceph-storage/1.3/storage-strategies/chapter-14-pg-count#maximum-pg-count), which *can* be tweaked.

As Alexei pointed out, having more PGs per OSD node is not actually fatal, but rather suboptimal. The Ceph cluster remains fully functional.

How many PGs per one OSD node we should have is probably a topic for another discussion.

Just to make this clear: giving the fact you still have to move PGs somewhere when removing an OSD node, there always will be a case (depending on the number of OSD nodes and PGs per node), when a Ceph cluster will remain in HEALTH_WARN status unless you add a new node.

Revision history for this message
Volodymyr Shypyguzov (vshypyguzov) wrote :

Also, according to Ceph documentation ( http://docs.ceph.com/docs/master/rados/operations/placement-groups/#set-the-number-of-placement-groups ), you cannot decrease PG number:

To set the number of placement groups in a pool, you must specify the number of placement groups at the time you create the pool. See Create a Pool for details. Once you’ve set placement groups for a pool, you may increase the number of placement groups (but you cannot decrease the number of placement groups)

So this is kind of expected behavior.

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

It looks like expected behaviour for Ceph, it is not a blocker for users and it is ok for Ceph cluster to be in WARN status if two nodes from three are down. This issue will be not reproduced for the large cluster with many nodes.

Please update your test scenario accordingly to the real use cases.

Status changed to Invalid for all releases.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.