kolla-ansible

ceph deployment fails with not enough pages

Bug #1763356 reported by Vladislav Belogrudov on 2018-04-12

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	kolla-ansible	Fix Released	Critical	Unassigned

Bug Description

TASK [cinder : Creating ceph pool] *********************************************
fatal: [10.1.2.3]: FAILED! => {"_ansible_parsed": true, "stderr_lines": ["Error ERANGE: pg_num 128 size 3 would mean 768 total pgs, which exceeds max 600 (mon_max_pg_per_osd 200 * num_in_osds 3)"], "changed": false, "end": "2018-04-12 12:42:29.120658", "_ansible_no_log": false, "_ansible_delegated_vars": {"ansible_host": "10.196.244.201"}, "cmd": ["docker", "exec", "ceph_mon", "ceph", "osd", "pool", "create", "volumes", "128", "128", "replicated", "disks"], "stdout": "", "start": "2018-04-12 12:42:28.490659", "delta": "0:00:00.629999", "stderr": "Error ERANGE: pg_num 128 size 3 would mean 768 total pgs, which exceeds max 600 (mon_max_pg_per_osd 200 * num_in_osds 3)", "rc": 34, "invocation": {"module_args": {"creates": null, "executable": null, "_uses_shell": false, "_raw_params": "docker exec ceph_mon ceph osd pool create volumes 128 128 replicated disks", "removes": null, "warn": true, "chdir": null}}, "stdout_lines": [], "failed": true}

I have 3 controllers where monitors run and 3 osd nodes with 1 ceph disk per node.

Refer to https://docs.openstack.org/kolla-ansible/latest/reference/ceph-guide.html

See original description

Vladislav Belogrudov (vlad-belogrudov) on 2018-04-12

description:

updated

Revision history for this message

Paul Bourke (pauldbourke) wrote on 2018-04-17:

This will happen once the Ceph packages been installed in images moves to 12.2.1 (currently 12.2.0).

From https://ceph.com/releases/v12-2-1-luminous-released :

The maximum number of PGs per OSD before the monitor issues a
warning has been reduced from 300 to 200 PGs. 200 is still twice
the generally recommended target of 100 PGs per OSD. This limit can
be adjusted via the mon_max_pg_per_osd option on the
monitors. The older mon_pg_warn_max_per_osd option has been removed.

Need to decide how best to address this in Kolla.

Changed in kolla-ansible:
status:	New → Confirmed
importance:	Undecided → Critical

Revision history for this message

Jeffrey Zhang (jeffrey4l) wrote on 2018-04-18:

paul is correct. And this easily happens in a small environment. There are two options when you hit the issue

1. set a smaller number for default ceph pg num

  #globals.yml
  ceph_pool_pg_num: 64
  ceph_pool_pgp_nmum: 64

2. or set a higher number for mon_max_pg_per_osd

#/etc/kolla/conf/ceph.conf

[global]
mon max pg per osd = 3000

for the kolla, I think we change 1) as default in group_vars/all.yml file.

Revision history for this message

Vladislav Belogrudov (vlad-belogrudov) wrote on 2018-04-18:

do you think 64 is small enough?

we have 7 pools currently and for 3 osd nodes it will give
64 * 3 * 7 = 1344 pgs, while 3 mons * 200 = 600

I think it is a save move to have pg num = 8

Revision history for this message

Vladislav Belogrudov (vlad-belogrudov) wrote on 2018-04-18:

...safe move...

Revision history for this message

Vladislav Belogrudov (vlad-belogrudov) wrote on 2018-04-18:

as soon as operator has played enough with a toy cluster and has decided to add more nodes the parameter can grow easily to a more comprehensive value.

Revision history for this message

Jeffrey Zhang (jeffrey4l) wrote on 2018-04-19:

adjust the pg_num for a production environment is bad. it will result in lots of traffic on whole ceph cluster. So I think the better way is warning the user before deployment. he should calculate the proper pg num for each pool base on the his environment.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-04-26: Fix merged to kolla-ansible (master)

Reviewed: https://review.openstack.org/564169
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=36f33f089bbda9bcc7e451b69413907cce8e3bb6
Submitter: Zuul
Branch: master

commit 36f33f089bbda9bcc7e451b69413907cce8e3bb6
Author: Paul Bourke <email address hidden>
Date: Tue Apr 24 14:07:25 2018 +0100

Reduce the default values for Ceph pgs

Required to keep Ceph working once we move to Luminous 12.2.1

Change-Id: I8d3e56f2053c939ea313c60cc04c0ff79dd27d25
Closes-Bug: 1763356

Changed in kolla-ansible:
status:	Confirmed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-06-07: Fix included in openstack/kolla-ansible 7.0.0.0b2

This issue was fixed in the openstack/kolla-ansible 7.0.0.0b2 development milestone.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.