ceph deployment fails with not enough pages

Bug #1763356 reported by Vladislav Belogrudov
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
Critical
Unassigned

Bug Description

TASK [cinder : Creating ceph pool] *********************************************
fatal: [10.1.2.3]: FAILED! => {"_ansible_parsed": true, "stderr_lines": ["Error ERANGE: pg_num 128 size 3 would mean 768 total pgs, which exceeds max 600 (mon_max_pg_per_osd 200 * num_in_osds 3)"], "changed": false, "end": "2018-04-12 12:42:29.120658", "_ansible_no_log": false, "_ansible_delegated_vars": {"ansible_host": "10.196.244.201"}, "cmd": ["docker", "exec", "ceph_mon", "ceph", "osd", "pool", "create", "volumes", "128", "128", "replicated", "disks"], "stdout": "", "start": "2018-04-12 12:42:28.490659", "delta": "0:00:00.629999", "stderr": "Error ERANGE: pg_num 128 size 3 would mean 768 total pgs, which exceeds max 600 (mon_max_pg_per_osd 200 * num_in_osds 3)", "rc": 34, "invocation": {"module_args": {"creates": null, "executable": null, "_uses_shell": false, "_raw_params": "docker exec ceph_mon ceph osd pool create volumes 128 128 replicated disks", "removes": null, "warn": true, "chdir": null}}, "stdout_lines": [], "failed": true}

I have 3 controllers where monitors run and 3 osd nodes with 1 ceph disk per node.

Refer to https://docs.openstack.org/kolla-ansible/latest/reference/ceph-guide.html

description: updated
Revision history for this message
Paul Bourke (pauldbourke) wrote :

This will happen once the Ceph packages been installed in images moves to 12.2.1 (currently 12.2.0).

From https://ceph.com/releases/v12-2-1-luminous-released :

The maximum number of PGs per OSD before the monitor issues a
warning has been reduced from 300 to 200 PGs. 200 is still twice
the generally recommended target of 100 PGs per OSD. This limit can
be adjusted via the mon_max_pg_per_osd option on the
monitors. The older mon_pg_warn_max_per_osd option has been removed.

Need to decide how best to address this in Kolla.

Changed in kolla-ansible:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :

paul is correct. And this easily happens in a small environment. There are two options when you hit the issue

1. set a smaller number for default ceph pg num

  #globals.yml
  ceph_pool_pg_num: 64
  ceph_pool_pgp_nmum: 64

2. or set a higher number for mon_max_pg_per_osd

  #/etc/kolla/conf/ceph.conf

  [global]
  mon max pg per osd = 3000

for the kolla, I think we change 1) as default in group_vars/all.yml file.

Revision history for this message
Vladislav Belogrudov (vlad-belogrudov) wrote :

do you think 64 is small enough?

we have 7 pools currently and for 3 osd nodes it will give
64 * 3 * 7 = 1344 pgs, while 3 mons * 200 = 600

I think it is a save move to have pg num = 8

Revision history for this message
Vladislav Belogrudov (vlad-belogrudov) wrote :

...safe move...

Revision history for this message
Vladislav Belogrudov (vlad-belogrudov) wrote :

as soon as operator has played enough with a toy cluster and has decided to add more nodes the parameter can grow easily to a more comprehensive value.

Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :

adjust the pg_num for a production environment is bad. it will result in lots of traffic on whole ceph cluster. So I think the better way is warning the user before deployment. he should calculate the proper pg num for each pool base on the his environment.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.openstack.org/564169
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=36f33f089bbda9bcc7e451b69413907cce8e3bb6
Submitter: Zuul
Branch: master

commit 36f33f089bbda9bcc7e451b69413907cce8e3bb6
Author: Paul Bourke <email address hidden>
Date: Tue Apr 24 14:07:25 2018 +0100

    Reduce the default values for Ceph pgs

    Required to keep Ceph working once we move to Luminous 12.2.1

    Change-Id: I8d3e56f2053c939ea313c60cc04c0ff79dd27d25
    Closes-Bug: 1763356

Changed in kolla-ansible:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 7.0.0.0b2

This issue was fixed in the openstack/kolla-ansible 7.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.