Need to synchronize time on nodes with master-node during deployment

Bug #1297293 reported by Anastasia Palkina
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Vladimir Sharshov

Bug Description

"build_id": "2014-03-25_01-01-50",
"mirantis": "yes",
"build_number": "40",
"nailgun_sha": "3044c2054904525601c921387322a2978e821677",
"ostf_sha": "013c13ab033a6829ca4eeaa2476c30837e814902",
"fuelmain_sha": "f7ee8bcaa3d993395669f2bcae893176ff2b3bbe",
"astute_sha": "d7c6c4d00ffd6e2fa74da442f573e6f39049961e",
"release": "5.0",
"fuellib_sha": "3445ab7550486074ec8e47fdaed869c697991364"

1. Create new environment (Ubuntu, simple node)
2. Choose both of ceph
3. Add controller, 3 compute+ceph
4. Start deployment. It was successful
5. But nova-network in XXX state on all of computes because date was different from master-node

root@node-13:~# nova-manage service list
Binary Host Zone Status State Updated_At
nova-consoleauth node-13 internal enabled :-) 2014-03-25 13:24:10
nova-cert node-13 internal enabled :-) 2014-03-25 13:24:13
nova-scheduler node-13 internal enabled :-) 2014-03-25 13:24:12
nova-conductor node-13 internal enabled :-) 2014-03-25 13:24:07
nova-network node-14 internal enabled XXX 2014-03-25 13:18:10
nova-compute node-14 nova enabled :-) 2014-03-25 13:24:12
nova-network node-15 internal enabled XXX 2014-03-25 13:25:18
nova-network node-16 internal enabled XXX 2014-03-25 13:18:10
nova-compute node-15 nova enabled :-) 2014-03-25 13:23:36
nova-compute node-16 nova enabled :-) 2014-03-25 13:24:13

Need to synchronize time on nodes with master-node during deployment

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

I'm often seeing this kind of problem when reverting a virtual fuel master from a snapshot. If master updates its clock after booting some slave nodes into bootstrap, these nodes will have their clock out of sync with the rest of the cluster. I found that forcing a time update on fuel master node with "service ntpd restart" before starting slave nodes into bootstrap helps. After about a minute, clock on the fuel master VM is in synch with the host, and all nodes booted after that remain in sync.

Changed in fuel:
status: New → Triaged
tags: added: low-hanging-fruit
Changed in fuel:
importance: Undecided → Medium
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

I faced with the same problem on 4.1.1 in deployment without reverts from snapshots. Centos + neutron GRE without vlans (each net on separate eth) 1 controller + 1 compute + 2 ceph + 1 cinder node - Deployment successful, but nova-manage service list reports about XXX state of services. By the way restart service helps :) and then all work as the charm.

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Changed in fuel:
milestone: 5.0 → 5.1
Changed in fuel:
importance: Medium → High
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

We should sync time while orchestrating the deployment process

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Vladimir Sharshov (vsharshov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/105138

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/105138
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=2c2c8da42f679b07f56138717e8b099a9dfa812d
Submitter: Jenkins
Branch: master

commit 2c2c8da42f679b07f56138717e8b099a9dfa812d
Author: Vladimir Sharshov <email address hidden>
Date: Mon Jul 7 16:03:21 2014 +0400

    Sync time between cluster nodes before deployment

    This fix prevent sutuations like this: nova-network
    in XXX state on all of computes because date was different
    from master-node

    Change-Id: I980d10d56addf32d56ac49a600ab9fc7ea5831aa
    Closes-Bug: #1297293

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Joshua Dotson (tns9) wrote :

Can this be backported to 5.0/stable, because Ceph is incredibly time-skew sensitive? With Ceph, data retention relies on minimal time skew.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.