Bug #1415954 “Excessive Ceph rebalancing can cause deployment fa...” : Bugs : Fuel for OpenStack

Aleksey Kasatkin (alekseyk-ru) on 2015-01-29

Changed in mos:
milestone:	none → 6.1

ruhe (ruhe) on 2015-01-29

Changed in mos:
importance:	Undecided → Critical

Revision history for this message

Leontii Istomin (listomin) wrote on 2015-01-30:

#1

reproduced.
snapshot is here https://drive.google.com/a/mirantis.com/file/d/0Bx4ptZV1Jt7hcFByR0pNbUtMTDQ/view?usp=sharing

Revision history for this message

Igor Yozhikov (iyozhikov) wrote on 2015-01-30:

#2

What was found:
from ceph.log from one of the controller nodes:
...
2015-01-29 23:20:51.670859 mon.0 192.168.0.3:6789/0 1 : [INF] mon.node-1@0 won leader election with quorum 0
...
2015-01-29 23:44:03.273977 mon.0 192.168.0.3:6789/0 31 : [INF] mon.node-1@0 won leader election with quorum 0,1,2
2015-01-29 23:44:03.276078 mon.0 192.168.0.3:6789/0 32 : [INF] monmap e3: 3 mons at {node-1=192.168.0.3:6789/0,node-44=192.168.0.46:6789/0,node-49=192.168.0.51:6789/0}
...
2015-01-29 23:53:01.390172 mon.0 192.168.0.3:6789/0 124 : [INF] osd.42 192.168.0.41:6800/13854 boot
2015-01-29 23:53:01.390511 mon.0 192.168.0.3:6789/0 125 : [INF] osdmap e22: 47 osds: 47 up, 47 in
...
2015-01-29 23:54:59.033482 osd.26 192.168.0.6:6800/14137 1 : [WRN] 3 slow requests, 3 included below; oldest blocked for > 30.781338 secs
...
2015-01-29 23:59:10.623681 osd.0 192.168.0.29:6800/13524 3 : [WRN] 6 slow requests, 6 included below; oldest blocked for > 384.902800 secs
...
2015-01-30 00:00:09.645041 mon.0 192.168.0.3:6789/0 212 : [INF] pgmap v109: 8384 pgs: 8384 active+clean; 694 bytes data, 98193 MB used, 43169 GB / 43265 GB avail

I see that ceph cluster initialization took about 30 minutes from 2015-01-29 23:20:51 to 2015-01-29 23:53:01.
Last log record mentioned above tells up about ceph health state, we could see that health is ok.

Conlusion:
I believe that image was imported into glance too early before ceph cluster became operational.

Revision history for this message

Victor Ryzhenkin (vryzhenkin) wrote on 2015-01-30:

#3

Can't reproduce on ISO:
{"build_id": "2015-01-28_22-55-01", "ostf_sha": "c9100263140008abfcc2704732e98fbdfd644068", "build_number": "84", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-01-28_22-55-01", "ostf_sha": "c9100263140008abfcc2704732e98fbdfd644068", "build_number": "84", "api": "1.0", "nailgun_sha": "92a85025f0ef9b3c5b42a1ba172573aa0ac54e33", "production": "docker", "python-fuelclient_sha": "cb8928ce34f5ca88c0d6cecc6331488db75362ac", "astute_sha": "ed5270bf9c6c1234797e00bd7d4dd3213253a413", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "", "fuellib_sha": "b2fbaa9ffb74fafe1f5c2c480944a78424e1ae28"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "92a85025f0ef9b3c5b42a1ba172573aa0ac54e33", "production": "docker", "python-fuelclient_sha": "cb8928ce34f5ca88c0d6cecc6331488db75362ac", "astute_sha": "ed5270bf9c6c1234797e00bd7d4dd3213253a413", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "", "fuellib_sha": "b2fbaa9ffb74fafe1f5c2c480944a78424e1ae28"}

CentOS, HA, Neutron GRE, Murano, Ceph-for-All. (3 virtual controllers, 1 SuperMicro compute)

It seems that the bug reprodusible only with bare-metal configurations.

tags:

added: bare-metal

Dmitry Mescheryakov (dmitrymex) on 2015-01-30

Changed in mos:
status:	New → Confirmed

Stanislav Makar (smakar) on 2015-02-03

Changed in mos:
assignee:	MOS Glance (mos-glance) → Stanislav Makar (smakar)

Stanislav Makar (smakar) on 2015-02-04

Changed in mos:
status:	Confirmed → In Progress

Revision history for this message

Stanislav Makar (smakar) wrote on 2015-02-10:

#4

proposed the patch https://review.openstack.org/#/c/153338/
waiting for test results on scale lab

Alex Ermolov (aermolov) on 2015-02-12

Changed in mos:
milestone:	6.1 → 5.1.2
milestone:	5.1.2 → 6.1

Alex Ermolov (aermolov) on 2015-02-12

no longer affects:

mos/6.0.x

Revision history for this message

Dina Belova (dbelova) wrote on 2015-02-19:

#5

Stanislav, I know you've tried it on the scale lab. Any updates?

Revision history for this message

Stanislav Makar (smakar) wrote on 2015-02-25:

#6

Yes, I tried, it fixes but there is a propose to redo it (move this check to another step)

Revision history for this message

Ryan Moe (rmoe) wrote on 2015-02-25:

#7

One of the issues I see with the proposed change is that it will cause problems when deploying additional OSDs. If you have a large Ceph cluster with lots of data in it and you want to add more OSDs later via Fuel it will fail with this change. Each time an OSD is added Ceph will rebalance the cluster. With lots of data this could easily take more than 30 minutes and Fuel will mark the deployment as failed even though the OSDs deployed (and are working) correctly. While Ceph is rebalancing the status will be HEATLH_WARN but the cluster is still usable.

I think moving the cluster health check to a post-deployment task is a better way to do it. Also, rather than checking the status we should write data to the cluster and read it back to verify that it's working correctly. If we want to verify that each individual OSD has deployed correctly we should verify that the OSD we just deployed is marked up and in by Ceph.

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2015-02-28:

#8

I think the solution to this bug should also take into account bug #1424060.

Vladimir Kuklin (vkuklin) on 2015-03-03

Changed in fuel:
status:	New → Triaged
importance:	Undecided → Critical
assignee:	nobody → Stanislav Makar (smakar)
milestone:	none → 6.1
status:	Triaged → In Progress
no longer affects:	mos

Vladimir Kuklin (vkuklin) on 2015-03-03

no longer affects:

mos/6.1.x

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-03-03:

#12

@Ryan, we should consider setting "ceph osd set noout" (thank you, Bruce Matthew, for the idea) at the deployment stage, and trigger rebalance later, by the user's decision. A decision to make rebalance after the deploy is finished should obviously as well disable the upload cirros task, if ceph is used for glance images as well. The another point was suggested by Chris Clason: "We need to go one step further and gradually ramp up the weight on the OSD's. Instead of dumping the drives in at their full weights"

Vladimir Kuklin (vkuklin) on 2015-03-03

no longer affects:

mos/6.0.x

Ryan Moe (rmoe) on 2015-03-04

tags:

added: customer-found

Ryan Moe (rmoe) on 2015-03-04

summary:

- TestVM Image has been added in Glance with "killed" status
+ Excessive Ceph rebalancing can cause deployment failures

Revision history for this message

Mykola Golub (mgolub) wrote on 2015-03-05:

#13

The rebalancing load can be significantly decreased by limiting recovery activity in ceph.conf:

osd max backfills = 1
osd recovery max active = 1

This is a widely used practice to set these parameters to decrease I/O load when adding/removing nodes. The defaul values are 10 and 15, which look too high. Note, this will very likely increase the rebalancing time.

I think we should apply this solution before using more complicated.

"ceph osd set noout" is not enough to disable automatic rebalance on deployment stage. To "freez" the cluster one could

  ceph osd set noout
  ceph osd set noin
  ceph osd set noup
  ceph osd set nodown

then unset them when deployment finished and you are ready for rebalance. Gradually increasing weight technique could be used too but it might be not necessary with the ceph.conf settings above (need to check though on large deployments).

OpenStack Infra (hudson-openstack) on 2015-03-05

Changed in fuel:
assignee:	Stanislav Makar (smakar) → Sergii Golovatiuk (sgolovatiuk)

OpenStack Infra (hudson-openstack) on 2015-03-05

Changed in fuel:
assignee:	Sergii Golovatiuk (sgolovatiuk) → Stanislav Makar (smakar)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-10: Fix proposed to fuel-library (master)

#14

Fix proposed to branch: master
Review: https://review.openstack.org/163019

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-03-10:

#15

@Mykola, is it possible what osd max backfills = 1, osd recovery max active = 1 lowered from 10 and 15 could cause the rebalance never ended under the heavy write load?

Stanislav Makar (smakar) on 2015-03-10

tags:

added: ceph

Revision history for this message

Mikolaj Golub (to-my-trociny) wrote on 2015-03-10:

#16

I have not seen reports about 'osd max backfills = 1' and 'osd recovery max active = 1' causing the rebalance never ended. Sure they may increase recovery time, still it looks like the problems caused by high load during recovery/rebalance much worse so these are commonly recommended settings. E.g. they were used in 47 disk servers/1128 OSDs ceph cluster at Cern:

http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern

Revision history for this message

Mykola Golub (mgolub) wrote on 2015-03-10:

#17

I sent the previous message (#16) from wrong account. Still it is me.

OpenStack Infra (hudson-openstack) on 2015-03-17

Changed in fuel:
assignee:	Stanislav Makar (smakar) → Bogdan Dobrelya (bogdando)

OpenStack Infra (hudson-openstack) on 2015-03-18

Changed in fuel:
assignee:	Bogdan Dobrelya (bogdando) → Stanislav Makar (smakar)

Revision history for this message

Stanislav Makar (smakar) wrote on 2015-03-18:

#18

SCALE LAB (50 ceph-osd) results:
patch fixes the problem and showed good itself even when was warn due to clock skew

root@node-101:~# ceph -s
    cluster e9df6ff0-ccb8-4a51-9d56-a6f92c85c921
     health HEALTH_WARN clock skew detected on mon.node-102, mon.node-103
     monmap e3: 3 mons at {node-101=192.168.0.4:6789/0,node-102=192.168.0.5:6789/0,node-103=192.168.0.6:6789/0}, election epoch 6, quorum 0,1,2 node-101,node-102,node-103
     osdmap e47: 47 osds: 47 up, 47 in
      pgmap v979: 16576 pgs, 11 pools, 13696 kB data, 51 objects
            98513 MB used, 43401 GB / 43498 GB avail
               16576 active+clean

task logs:

2015-03-13T18:23:26 info: [424] Run hook ---
priority: 300
fail_on_error: true
type: shell
uids:
- '101'
parameters:
cmd: ruby /etc/puppet/modules/osnailyfacter/modular/astute/ceph_ready_check.rb
timeout: 1800

2015-03-13T18:26:18 debug: [424] 954bd275-f135-44bf-8cb8-d8f5224d69eb: MC agent 'execute_shell_command', method 'execute', results: {:sender=>"
101", :statuscode=>0, :statusmsg=>"OK", :data=>{:stdout=>"try 0\nThere are PGs which are not in active state!\ntry 1\nThere are PGs which are not in active state!\ntry 2\nThere are PGs which are not in active state!\ntry 3\nThere are PGs which are not in active state!\ntry 4\nThere are PGs which are not in active state!\ntry 5\nThere are PGs which are not in active state!\ntry 6\nThere are PGs which are not in active state!\ntry 7\nThere are PGs which are not in active state!\ntry 8\nThere are PGs which are not in active state!\ntry 9\nThere are PGs which are not in active state!\ntry 10\nThere are PGs which are not in active state!\ntry 11\nThere are PGs which are not in active state!\ntry 12\nThere are PGs which are not in active state!\ntry 13\nThere are PGs which are not in active state!\ntry 14\nThere are PGs which are not in active state!\ntry 15\n", :stderr=>"ok\nok\nok\nok\nok\nok\nok\nok\nok\nok\nok\nok\nok\nok\nok\n", :exit_code=>0}}
2015-03-13T18:26:18 debug: [424] 954bd275-f135-44bf-8cb8-d8f5224d69eb: cmd: cd / && ruby /etc/puppet/modules/osnailyfacter/modular/astute/ceph_ready_check.rb
cwd: /
stdout: try 0
There are PGs which are not in active state!
try 1
There are PGs which are not in active state!
try 2
There are PGs which are not in active state!
try 3
There are PGs which are not in active state!
try 4
There are PGs which are not in active state!
try 5
There are PGs which are not in active state!
try 6
There are PGs which are not in active state!
try 7
There are PGs which are not in active state!
try 8
There are PGs which are not in active state!
try 9
There are PGs which are not in active state!
try 10
There are PGs which are not in active state!
try 11
There are PGs which are not in active state!
try 12
There are PGs which are not in active state!
try 13
There are PGs which are not in active state!
try 14
There are PGs which are not in active state!
try 15

stderr: ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok

exit code: 0
2015-03-13T18:26:18 info: [424] Run hook ---
priority: 400

SCALE LAB (50 ceph-osd) results:
patch  fixes the problem and showed good itself even when was warn due to clock skew

root@node-101:~# ceph -s
    cluster e9df6ff0-ccb8-4a51-9d56-a6f92c85c921
     health HEALTH_WARN clock skew detected on mon.node-102, mon.node-103
     monmap e3: 3 mons at {node-101=192.168.0.4:6789/0,node-102=192.168.0.5:6789/0,node-103=192.168.0.6:6789/0}, election epoch 6, quorum 0,1,2 node-101,node-102,node-103
     osdmap e47: 47 osds: 47 up, 47 in
      pgmap v979: 16576 pgs, 11 pools, 13696 kB data, 51 objects
            98513 MB used, 43401 GB / 43498 GB avail
               16576 active+clean

task logs:

2015-03-13T18:23:26 info: [424] Run hook ---
priority: 300
fail_on_error: true
type: shell
uids:
- '101'
parameters:
  cmd: ruby /etc/puppet/modules/osnailyfacter/modular/astute/ceph_ready_check.rb
  timeout: 1800

2015-03-13T18:26:18 debug: [424] 954bd275-f135-44bf-8cb8-d8f5224d69eb: MC agent 'execute_shell_command', method 'execute', results: {:sender=>"
101", :statuscode=>0, :statusmsg=>"OK", :data=>{:stdout=>"try 0\nThere are PGs which are not in active state!\ntry 1\nThere are PGs which are not in active state!\ntry 2\nThere are PGs which are not in active state!\ntry 3\nThere are PGs which are not in active state!\ntry 4\nThere are PGs which are not in active state!\ntry 5\nThere are PGs which are not in active state!\ntry 6\nThere are PGs which are not in active state!\ntry 7\nThere are PGs which are not in active state!\ntry 8\nThere are PGs which are not in active state!\ntry 9\nThere are PGs which are not in active state!\ntry 10\nThere are PGs which are not in active state!\ntry 11\nThere are PGs which are not in active state!\ntry 12\nThere are PGs which are not in active state!\ntry 13\nThere are PGs which are not in active state!\ntry 14\nThere are PGs which are not in active state!\ntry 15\n", :stderr=>"ok\nok\nok\nok\nok\nok\nok\nok\nok\nok\nok\nok\nok\nok\nok\n", :exit_code=>0}}
2015-03-13T18:26:18 debug: [424] 954bd275-f135-44bf-8cb8-d8f5224d69eb: cmd: cd / && ruby /etc/puppet/modules/osnailyfacter/modular/astute/ceph_ready_check.rb
cwd: /
stdout: try 0
There are PGs which are not in active state!
try 1
There are PGs which are not in active state!
try 2
There are PGs which are not in active state!
try 3
There are PGs which are not in active state!
try 4
There are PGs which are not in active state!
try 5
There are PGs which are not in active state!
try 6
There are PGs which are not in active state!
try 7
There are PGs which are not in active state!
try 8
There are PGs which are not in active state!
try 9
There are PGs which are not in active state!
try 10
There are PGs which are not in active state!
try 11
There are PGs which are not in active state!
try 12
There are PGs which are not in active state!
try 13
There are PGs which are not in active state!
try 14
There are PGs which are not in active state!
try 15

stderr: ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok

exit code: 0
2015-03-13T18:26:18 info: [424] Run hook ---
priority: 400

OpenStack Infra (hudson-openstack) on 2015-03-18

Changed in fuel:
assignee:	Stanislav Makar (smakar) → Bogdan Dobrelya (bogdando)

OpenStack Infra (hudson-openstack) on 2015-03-19

Changed in fuel:
assignee:	Bogdan Dobrelya (bogdando) → Stanislav Makar (smakar)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-20: Fix merged to fuel-library (master)

#19

Reviewed: https://review.openstack.org/153338
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=d7be830168abea5ba14b0e36aaedf7e69998b568
Submitter: Jenkins
Branch: master

commit d7be830168abea5ba14b0e36aaedf7e69998b568
Author: Stanislav Makar <email address hidden>
Date: Wed Mar 4 13:59:30 2015 +0000

Add the Ceph ready checking

When we deploy environment with big number of Ceph OSDs it takes time
to get ceph ready to use.

    Docimpact: operations documentation http://ceph.com/docs/master/rados/operations/monitoring/
    Closes-bug: #1415954
    Change-Id: If7ae4948978047e1dd533e300eb1bfdc61b51035

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

Dina Belova (dbelova) wrote on 2015-05-07:

#20

Fix verified for 6.1 on 233 ISO and later.

Changed in fuel:
status:	Fix Committed → Fix Released

Revision history for this message

Vitaly Sedelnik (vsedelnik) wrote on 2015-10-26:

#21

Won't Fix for 6.0-updates and 5.1.1-updates as we don't expect new 5.1.1 and 6.0 deployments

Fuel for OpenStack

Excessive Ceph rebalancing can cause deployment failures

Bug Description

Duplicates of this bug

Other bug subscribers

Related blueprints

Remote bug watches

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Fix Released	Critical	Stanislav Makar	Fuel for OpenStack 6.1
5.1.x	Won't Fix	Critical	MOS Maintenance	Fuel for OpenStack 5.1.1-updates
6.0.x	Won't Fix	Critical	Stanislav Makar	Fuel for OpenStack 6.0-updates