Comment 9 for bug 1462451

Revision history for this message
Mykola Golub (mgolub) wrote :

The cluster hanged with many placement groups stuck in state creating, with the following errors in logs:

2015-06-07 22:57:48.238453 7fd1124fe700 0 log [WRN] : slow request 2571.770231 seconds old, received at 2015-06-07 22:14:56.468107: osd_pg_create(pg0.1c,.... pg6.1ed0,5; ) v2 currently wait for new map

The problem was not reproduced when we hardcoded osd_pool_default_pg_num and osd_pool_default_pgp_num to 128 instead of allowing fuel to calculate it based on number of OSDs (8192 for the cluster of this size).

Although the root cause of the hang is not found (it might be a limit/timeout we stepped in ceph or OS , when large amount of placement groups are being created), there are some improvements for fuel that should improve the situation when deploying large clusters:

1) The formula for calculating pg number should be changed, giving values 10 times lower than currently for large clusters. Apart this issue, the overestimated number of pgs causes other issues, while the pg number is impossible to decrease after the pool creation:

  https://blueprints.launchpad.net/fuel/+spec/ceph-osd-pool-default-pg-num

2) Pools are created after controller nodes are deployed, but before OSDs are deployed. As a result if the pools have large pg num, huge number of PGs are in creating state, then OSD nodes are staring to be added, large number of PGs being created on the nodes that are deployed first, then new OSD appears and PGs should be moved. This process is not optimal and it is much less stressful for the cluster to create PGs after all OSDs are deployed and in IN and UP state, so no reballancing will ok and "early" ODS are not overloaded with placement groups.