fuel overestimates ceph default value for pool placement groups number

Bug #1464656 reported by Mykola Golub
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Medium
Kyrylo Galanov
7.0.x
Won't Fix
Medium
MOS Ceph
StackLight
Confirmed
Medium
Unassigned
0.10
Confirmed
Medium
Unassigned
0.9
Won't Fix
Medium
Unassigned

Bug Description

Fuel sets the following placement group parameters in ceph.conf

  osd_pool_default_pg_num
  osd_pool_default_pgp_num

using the formula:

  Nearest power of 2 (num_of_OSDs * 100 / num_of_replica)

This formula is correct only for clusters with one pool. For clusters with many pools it overestimate the number of placement groups by the number of the pools times (the goal is to have 100 placement groups per osd or a little high).

Too large number of placement groups requires significantly more resources and time for peering (e.g. after every crushmap change, during this period clients get stuck when accessing peering placement groups). It may also cause issues during deployment (1462451).

Taking into consideration that it is much easier to increase pg num values for existing pools (pg splitting is supported) than decreasing them (currently pg merging is not supported, so the only way is to recreate the pool and copy data), it is much better if we underestimate the parameters than overestimate.

In many cases we can expect that a cluster will have at least 10 pools (images, volumes, compute + 7 pools for radosgw). So, the formula can be changed to:

 max(128, Nearest power of 2 (num_of_OSDs * 100 / num_of_replica / 10))

Another option is to add a possibility for users to set this parameter manually via GUI, similarly to what we already do with ceph replication factor.

Revision history for this message
Mykola Golub (mgolub) wrote :
Changed in fuel:
milestone: none → 7.0
assignee: nobody → Fuel Library Team (fuel-library)
status: New → Confirmed
Changed in fuel:
importance: Undecided → Medium
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

The fix to bug #1462451 has introduced a regression and has to be reverted, fixing pg_num calculation is going to be a less disruptive fix. Priority upped to High to match the other bug, 6.1 milestone added.

Revision history for this message
Aleksandr Shaposhnikov (alashai8) wrote :

Guys, fixing pg_num after the ceph cluster deployment and rebalancing the data and placement group changing is a significant risk. This feature still didn't get status of official one and could be use only on our own risk.

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Here's proposed approach to the solution:
1) in fuel-library, set pg_num to hardcoded small value (either 128 or pg_num/16 as calculated by nailgun)
2) in astute, increase pg_num to the value calculated by nailgun as a post-deployment action (after CirrOS VM image is uploaded to Glance to avoid yet another timeout)

Mykola, please discuss your pg_num formula with Alex Shaposhnikov, he has some good arguments on why number of pools shouldn't be factored into it.

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

No consensus on resolution plan yet, and the theory of https://review.openstack.org/190189 being root cause for the regression hasn't been confirmed, so moving out of 6.1 to 6.1-updates. If 6.1 Ceph test at scale succeeds, this bug should be downgraded back to Medium and marked Won't Fix for 6.1.x.

Changed in fuel:
assignee: MOS Ceph (mos-ceph) → Kostiantyn Danylov (kdanylov)
status: Confirmed → In Progress
Changed in fuel:
assignee: Kostiantyn Danylov (kdanylov) → Andrew Woodward (xarses)
Changed in fuel:
assignee: Andrew Woodward (xarses) → Kostiantyn Danylov (kdanylov)
tags: added: known-issue
Changed in fuel:
assignee: Kostiantyn Danylov (kdanylov) → Sebastian Kalinowski (prmtl)
Changed in fuel:
assignee: Sebastian Kalinowski (prmtl) → Kostiantyn Danylov (kdanylov)
Changed in fuel:
assignee: Kostiantyn Danylov (kdanylov) → Vladimir Sharshov (vsharshov)
Changed in fuel:
assignee: Vladimir Sharshov (vsharshov) → Kostiantyn Danylov (kdanylov)
tags: added: ceph
Revision history for this message
Aleksandr Didenko (adidenko) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/204814
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=95c8e4f7b677979b24e77f8e0afbc6301ba6a07d
Submitter: Jenkins
Branch: master

commit 95c8e4f7b677979b24e77f8e0afbc6301ba6a07d
Author: koder aka kdanilov <email address hidden>
Date: Thu Jul 23 05:37:21 2015 +0300

    Ceph: fix PG count number

    Fix PG groups overestimation. Pre 7.0 FUEL use
    equal pg_count for all pools, which leads to large
    PG per OSD. This change would split pools on two
    groups - small and large. All pools would get
    at least small fixed amount of PG max(64, OSD_NUM/pool_sz).
    Large pools would share the rest PG groups
    accordingly to their weight. Only really used pools
    get PG.

    Closes-Bug: #1464656
    Partial-blueprint: ceph-pool-pg-num
    Change-Id: I79d917d01fb5593f132643d99cdeb9218b1ec625

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

IMO, this bug is medium for many reasons.

Historical: We have this implementation from first implementation of ceph. We had only few clients where we needed to adjust the values.

Technical: https://review.openstack.org/204814 has comments that should be addressed.

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/6.1.x
no longer affects: fuel/7.0.x
Changed in fuel:
milestone: 7.0 → 8.0
status: Fix Committed → Confirmed
assignee: Kostiantyn Danylov (kdanylov) → MOS Ceph (mos-ceph)
Revision history for this message
Andrey Grebennikov (agrebennikov) wrote :

I don't know how you guys can argue using the word "historically..." if this is just wrong. In case of production cluster with Large number of OSDs you'll always get warnings like "the pool X has few placement groups" exactly because of this issue - for example RGW pools should have Very fey PGs while main pools like compute and volumes should be much bigger. Keep in mind, there is no way to reduce the number of the PGs on the fly.
And yes, one of the big customer was really disappointed by this weak PG calculation method....

Revision history for this message
Kostiantyn Danylov (kdanylov) wrote :

> IMO, this bug is medium for many reasons
We found, that this also leads to ~30% performance loss

Dmitry Pyzhov (dpyzhov)
tags: added: customer-found
Revision history for this message
Kostiantyn Danylov (kdanylov) wrote :

We also found, that default FUEL PG cound leads to 20-40% less IO performance, than in case of optimal PG count

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Fuel DevOps Robot (<email address hidden>) on branch: master
Review: https://review.openstack.org/204811
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in fuel:
assignee: MOS Ceph (mos-ceph) → Kostiantyn Danylov (kdanylov)
status: Confirmed → In Progress
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Added 70mu1-confirmed tag per email update from Kostiantyn

tags: added: 70mu1-confirmed
Changed in fuel:
assignee: Kostiantyn Danylov (kdanylov) → Dmitry Ilyin (idv1985)
Dmitry Pyzhov (dpyzhov)
tags: added: area-library
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Dmitry Ilyin (idv1985) → Kostiantyn Danylov (kdanylov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/204811
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=81220692c3d39eda6370dbd054047fc6fa6ac3af
Submitter: Jenkins
Branch: master

commit 81220692c3d39eda6370dbd054047fc6fa6ac3af
Author: koder aka kdanilov <email address hidden>
Date: Thu Jul 23 05:28:55 2015 +0300

    Ceph: fix PG count number

    Fix PG groups overestimation

    Closes-Bug: #1464656
    Change-Id: I3ba64ee25101451555380dd812c67c4d344ec651

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Reopened as the commit was reverted in https://review.openstack.org/#/c/246495/

Changed in fuel:
status: Fix Committed → New
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/247441

Changed in fuel:
assignee: Kostiantyn Danylov (kdanylov) → Kyrylo Galanov (kgalanov)
status: New → In Progress
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 8.0 → 9.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/247441
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=ab3679782ca7af2134cec4696c0b89900ec30f79
Submitter: Jenkins
Branch: master

commit ab3679782ca7af2134cec4696c0b89900ec30f79
Author: koder aka kdanilov <email address hidden>
Date: Thu Jul 23 05:28:55 2015 +0300

    Ceph: fix PG count number

    Fix PG groups overestimation

    Change-Id: I8c401e9abe9798ded87f542c0d707198148d07d1
    Closes-Bug: #1464656

Changed in fuel:
status: In Progress → Fix Committed
Swann Croiset (swann-w)
Changed in lma-toolchain:
status: New → Confirmed
importance: Undecided → Medium
milestone: none → 1.0.0
Revision history for this message
Swann Croiset (swann-w) wrote :

lma-toolchain 0.9 is impacted since it reports Ceph cluster in WARNING state (MOS 8.0: deployment with 3 controllers, 3 OSDs, replica factor=3):
ceph -s
    cluster ec475b45-5e0f-48a8-9f3a-3b3c5eda47ea
     health HEALTH_WARN
            too many PGs per OSD (480 > max 300)
     monmap e3: 3 mons at {node-4=192.168.0.7:6789/0,node-7=192.168.0.11:6789/0,node-9=192.168.0.9:6789/0}
            election epoch 8, quorum 0,1,2 node-4,node-9,node-7
     osdmap e49: 4 osds: 4 up, 4 in
      pgmap v676: 640 pgs, 10 pools, 45745 kB data, 61 objects
            6450 MB used, 1369 GB / 1375 GB avail
                 640 active+clean
  client io 9540 B/s rd, 8085 B/s wr, 9 op/s

A workaround would be to suppress this warning by disabling it in ceph.conf (or increase this value): mon_pg_warn_max_per_osd = 0

But apparently, it is up to ceph admins to (re)organize cluster after deployment for better performance on MOS 8.0

Revision history for this message
Mykola Golub (mgolub) wrote :

Actually this case is a little different. This cluster has only 4 OSDs and 10 pools, i.e. it has 64 pgs per pool. Fuel uses 64 as a minimum per pool value, and it is not practical to use less values. I would say it is not optimal to have only 4 OSDs for any practical use, so not optimal pgs numbers are not so important for clusters like this.

We did not see this warning in previous MOS versions, because it has been added in hammer and we have started to use hammer since 8.0.

I guess Ceph could be improved not to show this warning for so small clusters though.

Revision history for this message
Alexander Vlasov (avlasov) wrote :

Hy guys,
I see that discussion on the topic of correct pg_num is still taking place, I just want to let you know that this issue affects one more deployment.

I see that pg_num is calculated for each pool

"per_pool_pg_nums"=>
  {"compute"=>2048,
   "default_pg_num"=>64,
   "volumes"=>4096,
   "images"=>256,
   "backups"=>1024,
   ".rgw"=>1024},

But in the manifests default value used.

In this situation deploy fails because of a "too few PG per osd". There 54 of them.

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "301"
  build_id: "301"
  nailgun_sha: "4162b0c15adb425b37608c787944d1983f543aa8"
  python-fuelclient_sha: "486bde57cda1badb68f915f66c61b544108606f3"
  fuel-agent_sha: "50e90af6e3d560e9085ff71d2950cfbcca91af67"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "6c5b73f93e24cc781c809db9159927655ced5012"
  fuel-library_sha: "5d50055aeca1dd0dc53b43825dc4c8f7780be9dd"
  fuel-ostf_sha: "2cd967dccd66cfc3a0abd6af9f31e5b4d150a11c"
  fuelmain_sha: "a65d453215edb0284a2e4761be7a156bb5627677"

Changed in lma-toolchain:
milestone: 1.0.0 → 0.10.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-octane (master)

Reviewed: https://review.openstack.org/315109
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=ff7a897aa40453d58cc7e4b397b3d3f65db2b1ce
Submitter: Jenkins
Branch: master

commit ff7a897aa40453d58cc7e4b397b3d3f65db2b1ce
Author: Sergey Abramov <email address hidden>
Date: Wed May 11 17:46:47 2016 +0300

    Unhealthy ceph don't break upgrade

    Bad health ceph status shouldn't stop node upgrade.
    The reason is fuel overestimates ceph default value for pool
    placement groups number.

    Change-Id: Iff301ffeedd26a6f532dbbd480a6395cd774181a
    Related-Bug: 1464656

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-octane (stable/8.0)

Related fix proposed to branch: stable/8.0
Review: https://review.openstack.org/320455

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-octane (stable/8.0)

Reviewed: https://review.openstack.org/320455
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=90719da0b5789342c1ef18abe3d5573e8cccbc2f
Submitter: Jenkins
Branch: stable/8.0

commit 90719da0b5789342c1ef18abe3d5573e8cccbc2f
Author: Sergey Abramov <email address hidden>
Date: Wed May 11 17:46:47 2016 +0300

    Unhealthy ceph don't break upgrade

    Bad health ceph status shouldn't stop node upgrade.
    The reason is fuel overestimates ceph default value for pool
    placement groups number.

    Change-Id: Iff301ffeedd26a6f532dbbd480a6395cd774181a
    Related-Bug: 1464656
    (cherry picked from commit ff7a897aa40453d58cc7e4b397b3d3f65db2b1ce)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-octane (stable/7.0)

Related fix proposed to branch: stable/7.0
Review: https://review.openstack.org/320837

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-octane (stable/7.0)

Reviewed: https://review.openstack.org/320837
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=39b75e2bf9e7cc9899b99abc748b79a7b13bf339
Submitter: Jenkins
Branch: stable/7.0

commit 39b75e2bf9e7cc9899b99abc748b79a7b13bf339
Author: Sergey Abramov <email address hidden>
Date: Wed May 11 17:46:47 2016 +0300

    Unhealthy ceph don't break upgrade

    Bad health ceph status shouldn't stop node upgrade.
    The reason is fuel overestimates ceph default value for pool
    placement groups number.

    Change-Id: Iff301ffeedd26a6f532dbbd480a6395cd774181a
    Related-Bug: 1464656
    (cherry picked from commit ff7a897aa40453d58cc7e4b397b3d3f65db2b1ce)

Changed in lma-toolchain:
milestone: 0.10.0 → none
tags: added: area-ceph
removed: area-library ceph
tags: added: on-verification
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-octane (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/332931

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-octane (stable/mitaka)
Download full text (40.5 KiB)

Reviewed: https://review.openstack.org/332931
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=24f3c396612eb0c69fbf06bab3cebbb9ca829426
Submitter: Jenkins
Branch: stable/mitaka

commit b34d749f1c281dafbdbf155bd86830dc8f2a9aa2
Author: Ilya Kharin <email address hidden>
Date: Wed Jun 22 21:10:56 2016 +0300

    Support mock==1.8.0 in unit tests

    Use assert_called_once_with instead of assert_called_once that was
    introduced in 2.0.0.

    Change-Id: Ifb7699d4b552d148984961727355f0e23b487c7c

commit d60f1905143bb7576ffd670595de1c4aeafd7b34
Author: Ilya Kharin <email address hidden>
Date: Wed Jun 22 02:01:52 2016 +0300

    Allow to authorize by the predefined admin_token

    The admin_token_auth middleware is added to keystone pipelines to
    allow authorization by admin_token.

    Change-Id: Ic03150305a669fad1446436a68051fb9aa25b892

commit cc7fab59f44ffef60285f8732d798f52469b2530
Author: Ilya Kharin <email address hidden>
Date: Wed Jun 22 01:47:01 2016 +0300

    Reset default_domain_id before the keystone task

    The default_domain_id should be removed from keystone.conf after restore
    of DB and before to apply the keystone puppet task to avoid of using the
    configured domain as a default.

    Change-Id: I05a6c48532e8042496b3d8ccef53d65bf8c44653

commit e3f82399d567dbcfc1ae9a1ecbddba7bf5028fc8
Author: Ilya Kharin <email address hidden>
Date: Wed Jun 22 01:32:47 2016 +0300

    Add helper function to iterate over parameters

    The helpers.iterate_parameters function allows to iterate over lines of
    INI-like files along with a context of information, such section,
    parameter and value.

    Change-Id: I55b179118116fd5dacf100754057ea6589782dc2

commit 5ed370a4fbb3369d0f75873d205ac7f0c655f93a
Author: Ilya Kharin <email address hidden>
Date: Wed Jun 22 01:18:05 2016 +0300

    Add update_file context manager for local files

    The subprocess.update_file function provides an ability to update
    content of a local file by iterating over lines of an original file and
    forming a result content in a temporary file to replace the original
    file in the end. This function is very useful to change configuration
    files.

    Change-Id: I433a5da67887b231400dd63131799019f45c277c

commit 58f31e6c5f408630c42565ee53de6b59457bc84c
Author: Oleg Gelbukh <email address hidden>
Date: Wed Jun 22 14:34:03 2016 +0000

    Escape passwords passed to openstack client

    If special symbols are used in password for 'admin' user in
    OpenStack, octane passes them to command line client as is
    and it breaks shell.

    Properly escape the password before passing it to subprocess.

    Change-Id: Iad635aec6d5b5cc32975937e00205b7e89dc99d9
    Closes-bug: 1585960

commit 896aba1191eeb59cf85cc8be6a2ae67e08b76070
Author: Yuriy Taraday <email address hidden>
Date: Wed Jun 22 15:28:45 2016 +0300

    Add absolute_import to util/docker.py to avoid local tempfile module

    Closes-Bug: 1595156
    Change-Id: I9484efce6fa7aec1b41cf592f9e9768d85931fa7

commit 46586a62df962b2ce00b5d8a63f6fd34c920a1f4
Author: Oleg Gelbukh <email address hidden>
Date: Thu M...

tags: added: in-stable-mitaka
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Ok verified on MOS 9.0 #495:

root@node-1:~# cat /etc/ceph/ceph.conf | grep osd_pool_default_pg
osd_pool_default_pg_num = 64
osd_pool_default_pgp_num = 64
root@node-1:~#
root@node-1:~#
root@node-1:~# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.90999 root default
-2 0.90999 host node-2
 0 0.90999 osd.0 up 1.00000 1.00000

Changed in fuel:
status: Fix Committed → Confirmed
status: Confirmed → Fix Released
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

We no longer support MOS5.1, MOS6.0, MOS6.1
We deliver only Critical/Security fixes to MOS7.0, MOS8.0.
We deliver only High/Critical/Security fixes to MOS9.2.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers