sync_time fails on non-first deployment of controllers

Bug #1452912 reported by Andrew Woodward
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Andrew Woodward
6.0.x
Invalid
Undecided
Unassigned

Bug Description

If a deploymet fails, or otherwise has to run the controllers a second time, the ntp_sync task fails. Due to bad error reporting from astute for tasks running on multiple nodes at once, its hard to troubleshoot. I finally sat down and found the root cause.

The command

ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+' | sed '/^#/d' | awk '{print $2}')

will find more than one line of results on the controllers

0.pool.ntp.org
1.pool.ntp.org
2.pool.ntp.org

while non-controllers have (controller vip)

server 192.168.0.2 iburst

So I reproduced the operation in bash

[root@fuel ~]# for each in {13..17} ; do ssh node-${each} -C <<EOF || break
ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+' | sed '/^#/d' | awk '{print $2}')
>
> EOF
> done
Pseudo-terminal will not be allocated because stdin is not a terminal.
Warning: Permanently added 'node-13' (RSA) to the list of known hosts.
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-52-generic x86_64)

 * Documentation: https://help.ubuntu.com/
You have new mail.
stdin: is not a tty
 7 May 20:38:13 ntpdate[31567]: adjust time server 204.9.54.119 offset -0.003217 sec
-bash: line 2: 1.pool.ntp.org: command not found
-bash: line 3: 2.pool.ntp.org: command not found
[root@fuel ~]#

we see that the second and third lines on the controller are interpreted as commands to bash and are likely raising non-zero exit code back to astute, hence it's failure.

if we change the command to

for each in {13..17} ; do ssh node-${each} -C <<EOF || break
ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+' | sed '/^#/d' | awk '{print $2}'| head -1)

EOF
done

then it runs on each node

full transcript and notes http://paste.openstack.org/show/216490/

Andrew Woodward (xarses)
summary: - ntp_sync fails on non-first deployment of controllers
+ sync_time fails on non-first deployment of controllers
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/181154

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Andrew Woodward (xarses)
status: Triaged → In Progress
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I set the 6.0.x milestone to invalid as there were many changes for ntpd had been done only for the 6.1 release cycle.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/181154
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=5f287aa25c05a487553cd480da941aba7d08f50e
Submitter: Jenkins
Branch: master

commit 5f287aa25c05a487553cd480da941aba7d08f50e
Author: Andrew Woodward <email address hidden>
Date: Thu May 7 14:25:26 2015 -0700

    Limit sync_time to only one node

    As described in # 1452912, If the ntp servers is more than one node
    on a host then it will cause a non-zero exit code to astute which causes
    the task to fail.

    Change-Id: I865d1ee9aebfaff3bca3827d3fadd394d2a624e1
    Closes-bug: #1452912

Changed in fuel:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.