sync_time fails on non-first deployment of controllers
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Committed
|
High
|
Andrew Woodward | ||
6.0.x |
Invalid
|
Undecided
|
Unassigned |
Bug Description
If a deploymet fails, or otherwise has to run the controllers a second time, the ntp_sync task fails. Due to bad error reporting from astute for tasks running on multiple nodes at once, its hard to troubleshoot. I finally sat down and found the root cause.
The command
ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\
will find more than one line of results on the controllers
0.pool.ntp.org
1.pool.ntp.org
2.pool.ntp.org
while non-controllers have (controller vip)
server 192.168.0.2 iburst
So I reproduced the operation in bash
[root@fuel ~]# for each in {13..17} ; do ssh node-${each} -C <<EOF || break
ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\
>
> EOF
> done
Pseudo-terminal will not be allocated because stdin is not a terminal.
Warning: Permanently added 'node-13' (RSA) to the list of known hosts.
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-52-generic x86_64)
* Documentation: https:/
You have new mail.
stdin: is not a tty
7 May 20:38:13 ntpdate[31567]: adjust time server 204.9.54.119 offset -0.003217 sec
-bash: line 2: 1.pool.ntp.org: command not found
-bash: line 3: 2.pool.ntp.org: command not found
[root@fuel ~]#
we see that the second and third lines on the controller are interpreted as commands to bash and are likely raising non-zero exit code back to astute, hence it's failure.
if we change the command to
for each in {13..17} ; do ssh node-${each} -C <<EOF || break
ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\
EOF
done
then it runs on each node
full transcript and notes http://
summary: |
- ntp_sync fails on non-first deployment of controllers + sync_time fails on non-first deployment of controllers |
Fix proposed to branch: master /review. openstack. org/181154
Review: https:/