Time drift can prevent a Ceph cluster from starting

Bug #1519151 reported by Gregory Elkinbard on 2015-11-24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
MOS Ceph
MOS Ceph
MOS Ceph

Bug Description

Clock skew of 0.05 sec can prevent Ceph cluster from starting for 20-30 minutes
While we do use NTP, unfortunately it is set to gradually adjust time and even a small clock drift will take a while to resolve.
Hardware in question was shutdown for 1 week to facilitate its transfer from one facility to another and accumulated a relatively trivial amount of clock drift, which prevented the Ceph cluster from starting up. Clock drift was so small that it was not visible using the date command.

We need to be more aggressive about syncing up the clocks on node start.

Stanislaw Bogatkin (sbogatkin) wrote :

It can be done by adding new option to puppet-ntp module. Like 'stepout' one, there is an option named 'step' which can regulate clock step offset. By set it value to <0.05, desired cluster state will be achieved much faster.

Changed in fuel:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → MOS Puppet Team (mos-puppet)
milestone: none → 8.0
Maciej Relewicz (rlu) on 2015-11-25
tags: added: area-mos
Changed in fuel:
assignee: MOS Puppet Team (mos-puppet) → Stanislaw Bogatkin (sbogatkin)
Stanislaw Bogatkin (sbogatkin) wrote :

Also, I have a question to ceph guys - maybe it is better to fix ceph tolerancy to time variance? We have tons of other soft (and cluster one, like corosync) which have much better tolerancy. Why ceph needs 50ms? Can this time be bigger, for example 150ms?

Changed in fuel:
assignee: Stanislaw Bogatkin (sbogatkin) → Herman Narkaytis (hnarkaytis)
Changed in fuel:
assignee: Herman Narkaytis (hnarkaytis) → MOS Ceph (mos-ceph)
Roman Podoliaka (rpodolyaka) wrote :

We no longer fix Medium bugs in 8.0, closing as Won't Fix

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers