Time drift can prevent a Ceph cluster from starting

Bug #1519151 reported by Gregory Elkinbard
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Triaged
Medium
MOS Ceph
8.0.x
Won't Fix
Medium
MOS Ceph
Mitaka
Triaged
Medium
MOS Ceph

Bug Description

Clock skew of 0.05 sec can prevent Ceph cluster from starting for 20-30 minutes
While we do use NTP, unfortunately it is set to gradually adjust time and even a small clock drift will take a while to resolve.
Hardware in question was shutdown for 1 week to facilitate its transfer from one facility to another and accumulated a relatively trivial amount of clock drift, which prevented the Ceph cluster from starting up. Clock drift was so small that it was not visible using the date command.

We need to be more aggressive about syncing up the clocks on node start.

Tags: area-mos
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

It can be done by adding new option to puppet-ntp module. Like 'stepout' one, there is an option named 'step' which can regulate clock step offset. By set it value to <0.05, desired cluster state will be achieved much faster.

Changed in fuel:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → MOS Puppet Team (mos-puppet)
milestone: none → 8.0
Maciej Relewicz (rlu)
tags: added: area-mos
Changed in fuel:
assignee: MOS Puppet Team (mos-puppet) → Stanislaw Bogatkin (sbogatkin)
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Also, I have a question to ceph guys - maybe it is better to fix ceph tolerancy to time variance? We have tons of other soft (and cluster one, like corosync) which have much better tolerancy. Why ceph needs 50ms? Can this time be bigger, for example 150ms?

Changed in fuel:
assignee: Stanislaw Bogatkin (sbogatkin) → Herman Narkaytis (hnarkaytis)
Changed in fuel:
assignee: Herman Narkaytis (hnarkaytis) → MOS Ceph (mos-ceph)
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

We no longer fix Medium bugs in 8.0, closing as Won't Fix

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.