Fuel for OpenStack

Need to set production oriented Ceph configuration parameters to increase the Ceph performance

Bug #1374969 reported by Timur Nurlygayanov on 2014-09-28

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	High	Stanislav Makar	Fuel for OpenStack 6.1
	6.0.x	Won't Fix	Undecided	Stanislaw Bogatkin	Fuel for OpenStack 6.0-updates

Bug Description

To increase the Ceph performance when we install OpenStack with Ceph storage we should set production-oriented parameters.
All parameters were tested on lab with 4 Ceph nodes with SSD disks and 2x10Gb interfaces.

On all Ceph nodes, in /etc/ceph/ceph.conf (we enabled Ceph cashing and set optimal parameters for osd nodes sync):
[global]
...
osd_pool_default_pg_num = 1024
osd_pool_default_pgp_num = 1024
osd_pool_default_flag_hashpspool = true
...

[osd]
osd recovery max active = 1
osd max backfills = 1
filestore max sync interval = 30
filestore min sync interval = 29
filestore flusher = false
filestore queue max ops = 10000
filestore op threads = <CPU count>
osd op threads = <CPU count>
...

[client]
rbd cache = true
rbd cache writethrough until flush = true

If we will use Ceph for Glance volumes:
On controller nodes, in /etc/cinder/cinder.conf (it will increase performance because in case of any issues with ceph disk cinder will write data to temp dir and after that sync it):
[DEFAULT]
...
volume_tmp_dir=/tmp ### or path to free SSD disk?

Tags:

Timur Nurlygayanov (tnurlygayanov) on 2014-09-28

tags:

added: customer-found

Mike Scherbakov (mihgen) on 2014-09-28

Changed in fuel:
milestone:	none → 6.0

Nastya Urlapova (aurlapova) on 2014-09-29

Changed in fuel:
assignee:	nobody → Fuel Library Team (fuel-library)
status:	New → Opinion

Dmitry Borodaenko (angdraug) on 2014-09-30

Changed in fuel:
status:	Opinion → Confirmed
importance:	High → Low

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-09-30:

Providing more optimized settings for Ceph deployments is a valid request, so I changed status from Opinion to Confirmed.

Since the problem affects only specific configurations and doesn't render the whole environment unusable, the priority is Medium.

Timur Nurlygayanov (tnurlygayanov) on 2014-09-30

Changed in fuel:
importance:	Low → Medium

Revision history for this message

Gregory Elkinbard (gelkinbard) wrote on 2014-10-01:

How are these parameters are affected by using HDD instead of SSD.
Most production clusters are primarily HDD based
with just few SSDs for write logging.

Vladimir Kuklin (vkuklin) on 2014-10-09

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Dmitry Borodaenko (dborodaenko)

Dmitry Borodaenko (angdraug) on 2014-10-13

Changed in fuel:
assignee:	Dmitry Borodaenko (dborodaenko) → Fuel Library Team (fuel-library)

Vladimir Kuklin (vkuklin) on 2014-11-26

Changed in fuel:
milestone:	6.0 → 6.1

Stanislaw Bogatkin (sbogatkin) on 2015-02-09

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Stanislaw Bogatkin (sbogatkin)

Revision history for this message

Mykola Golub (mgolub) wrote on 2015-02-24:

For ceph.conf the suggested configuration:

  [client]
     rbd cache = true
     rbd cache writethrough until flush = true

looks good to me. The first parameter enables writeback, the second one is for safety and tells to start out in writethrough mode, and switch to writeback after the first flush request is received. I.e. if the client behaves properly (sends flush) it will be switched to writeback mode. If it does not, writeback is dangerous for it.

It looks these parameter have already been applied though (#1361391).

For other ceph.conf options suggested:

osd pool default flag hashpspool = true

This is the default in the recent ceph versions. If it is not in the version we have in MOS, then setting it looks good.

The following also may be good:

osd pool default pg num = 1024
osd pool default pgp num = 1024

Actually it depends on how many pools and osd nodes the cluster has. Usually it is recommended to have about 100 pgs in one osd. So this parameter should be calculated as

100 * number_of_osds / number_of_pools

It looks we have only 3 used pools (images, volumes, compute) and at least 3 osds (but I suppose usually more), so 1024 looks reasonable to me. Currently it seems we have 256, which might be acceptable too. But actually, it might have more sense not to rely on defaults and explicitly specify pg_num and pgp_num when pools are created taking into account current number of osd nodes?

These parameters also look good in general, as I see they are recommended frequently to decrease osd load when recovering:

osd recovery max active = 1
osd max backfills = 1

Right now I can't tell how other (filestore) options are good in general case. Actually, it would be nice to see in such reports not only new suggested values, but their defaults, and some reasoning why this is good to change.

Revision history for this message

Timur Nurlygayanov (tnurlygayanov) wrote on 2015-02-24:

Mykola, thank you for update!

We need to check the version of Ceph in the current and future releases of MOS and set all required parameters.

Vladimir Kuklin (vkuklin) on 2015-04-03

Changed in fuel:
assignee:	Stanislaw Bogatkin (sbogatkin) → Stanislav Makar (smakar)

Vladimir Kuklin (vkuklin) on 2015-04-06

Changed in fuel:
importance:	Medium → High

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-09: Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/171973

Changed in fuel:
status:	Confirmed → In Progress

Revision history for this message

Stanislav Makar (smakar) wrote on 2015-04-09:

Implemented almost all except below

Actually it depends on how many pools and osd nodes the cluster has. Usually it is recommended to have about 100 pgs in one osd. So this parameter should be calculated as
100 * number_of_osds / number_of_pools

I will contact nailgun team due to it comes from nailgun

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-09: Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/163019
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=c52d4fc377efe1134e8be81a18560c0a6e0138c3
Submitter: Jenkins
Branch: master

commit c52d4fc377efe1134e8be81a18560c0a6e0138c3
Author: Stanislav Makar <email address hidden>
Date: Tue Mar 10 14:23:48 2015 +0000

Decrease I/O load when adding/removing OSD nodes

* Set options osd_max_backfills and osd_recovery_max_active
* Fix the options which use white space instead of underscore

    Closes-bug: #1430845
    Related-bug: #1374969
    Change-Id: I3cdcec6c5bd39e5cbc55ecf8a29b751c784851a0

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-09: Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/171973
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=25e27d561bb49b2889055856e57c2af34843e1f7
Submitter: Jenkins
Branch: master

commit 25e27d561bb49b2889055856e57c2af34843e1f7
Author: Stanislav Makar <email address hidden>
Date: Thu Apr 9 08:59:02 2015 +0000

Set production oriented Ceph configuration parameters

    Set production oriented Ceph configuration parameters to increase the
    Ceph performance in proper place:
    * rbd cache = true
    * rbd cache writethrough until flush = true

Change-Id: Ic7cf72a4f390ce692fac9bf44bb481c412b370f8
Closes-bug: #1374969

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-20: Related fix proposed to fuel-library (stable/6.0)

Related fix proposed to branch: stable/6.0
Review: https://review.openstack.org/175364

Mykola Golub (mgolub) on 2015-05-21

tags:

added: on-verification

Revision history for this message

Mykola Golub (mgolub) wrote on 2015-05-22:

#10

Verified on

{
    "build_id": "2015-05-19_10-05-51",
    "build_number": "437",
    "auth_required": true,
    "fuel-ostf_sha": "9ce1800749081780b8b2a4a7eab6586583ffaf33",
    "fuel-library_sha": "2814c51668f487e97e1449b078bad1942421e6b9",
    "nailgun_sha": "593c99f2b46cf52b2be6c7c6e182b6ba9f2232cd",
    "openstack_version": "2014.2.2-6.1",
    "production": "docker",
    "api": "1.0",
    "python-fuelclient_sha": "e19f1b65792f84c4a18b5a9473f85ef3ba172fce",
    "astute_sha": "96801c5bccb14aa3f2a0d7f27f4a4b6dd2b4a548",
    "fuelmain_sha": "68796aeaa7b669e68bc0976ffd616709c937187a",
    "feature_groups": [
        "mirantis"
    ],
    "release": "6.1",
    "release_versions": {
        "2014.2.2-6.1": {
            "VERSION": {
                "build_id": "2015-05-19_10-05-51",
                "build_number": "437",
                "fuel-library_sha": "2814c51668f487e97e1449b078bad1942421e6b9",
                "nailgun_sha": "593c99f2b46cf52b2be6c7c6e182b6ba9f2232cd",
                "fuel-ostf_sha": "9ce1800749081780b8b2a4a7eab6586583ffaf33",
                "production": "docker",
                "api": "1.0",
                "python-fuelclient_sha": "e19f1b65792f84c4a18b5a9473f85ef3ba172fce",
                "astute_sha": "96801c5bccb14aa3f2a0d7f27f4a4b6dd2b4a548",
                "fuelmain_sha": "68796aeaa7b669e68bc0976ffd616709c937187a",
                "feature_groups": [
                    "mirantis"
                ],
                "release": "6.1",
                "openstack_version": "2014.2.2-6.1"
            }
        }
    }
}

using Juno on Ubuntu 14.04.1.

The settings are set correctly.

Mykola Golub (mgolub) on 2015-05-26

tags:

removed: on-verification

Mykola Golub (mgolub) on 2015-05-27

Changed in fuel:
status:	Fix Committed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-06-15: Related fix merged to fuel-library (stable/6.0)

#11

Reviewed: https://review.openstack.org/175364
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=cfa8b455e3801c417e5bea63d1e5bdc890f57df0
Submitter: Jenkins
Branch: stable/6.0

commit cfa8b455e3801c417e5bea63d1e5bdc890f57df0
Author: Stanislav Makar <email address hidden>
Date: Tue Mar 10 14:23:48 2015 +0000

Decrease I/O load when adding/removing OSD nodes

* Set options osd_max_backfills and osd_recovery_max_active
* Fix the options which use white space instead of underscore

    Closes-bug: #1430845
    Related-bug: #1374969
    Change-Id: I3cdcec6c5bd39e5cbc55ecf8a29b751c784851a0

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Related blueprints

100 nodes support (fuel only)

Remote bug watches

Bug watches keep track of this bug in other bug trackers.