Deployment task failed: `ntpdate: no server suitable for synchronization found`

Bug #1430482 reported by Dennis Dmitriev
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Stanislaw Bogatkin
5.1.x
Won't Fix
High
Fuel Library (Deprecated)

Bug Description

Executing of pre-deployment hook for time synchronization is failed on different environments (faced on CentOS with nova-flatdhcp and CentOS with Neutron/Vlan on different hosts):

http://paste.openstack.org/show/191385/

Execute of the same command later is successful:

[root@node-1 ~]# cd / && ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+' | sed '/^#/d' | awk '{print $2}')
10 Mar 17:41:03 ntpdate[4133]: adjust time server 10.109.0.2 offset -0.000585 sec

The cause of the issue is that the Fuel admin node is not ready to provide time right after reverting.
Time synchronization methods should be updated.

Tags: system-tests
Changed in fuel:
importance: Undecided → High
assignee: Fuel QA Team (fuel-qa) → Dennis Dmitriev (ddmitriev)
status: New → In Progress
Changed in fuel:
assignee: Dennis Dmitriev (ddmitriev) → nobody
status: In Progress → Incomplete
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

The issue is that the 'ntpd' service is not opening ports for listening, if it is started by pacemaker script.

There are open ports if ntpd started by /etc/init.d/ntp : http://paste.openstack.org/show/191589/

There are no ports open if ntpd started by pacemaker : http://paste.openstack.org/show/191590/

Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Sylwester Brzeczkowski (sbrzeczkowski) wrote :

[root@fuel ~]# fuel --fuel-version
DEPRECATION WARNING: file /etc/fuel/client/config.yaml is found and will be used as a source for settings. However, it deprecated and will not be used by default in the ongoing version of python-fuelclient.
api: '1.0'
astute_sha: 93de472789d9fc351d915e401892c9f792c14ca2
auth_required: true
build_id: 2015-03-11_10-07-21
build_number: '183'
feature_groups:
- mirantis
fuellib_sha: acd7dfb5f93ee0719464d07faf5883ee804a7205
fuelmain_sha: 0f588ec9125cc1f4dd24a07d3bc6903c97b84d27
nailgun_sha: e335b03bcf279daa8606f423b5bf344e9b32c434
ostf_sha: 8df5f2fcdae3bc9ea7d700ffd64db820baf51914
production: docker
python-fuelclient_sha: 8a292dbdfc3afc1994fd8a81a28903f9a5cca351
release: '6.1'
release_versions:
  2014.2-6.1:
    VERSION:
      api: '1.0'
      astute_sha: 93de472789d9fc351d915e401892c9f792c14ca2
      build_id: 2015-03-11_10-07-21
      build_number: '183'
      feature_groups:
      - mirantis
      fuellib_sha: acd7dfb5f93ee0719464d07faf5883ee804a7205
      fuelmain_sha: 0f588ec9125cc1f4dd24a07d3bc6903c97b84d27
      nailgun_sha: e335b03bcf279daa8606f423b5bf344e9b32c434
      ostf_sha: 8df5f2fcdae3bc9ea7d700ffd64db820baf51914
      production: docker
      python-fuelclient_sha: 8a292dbdfc3afc1994fd8a81a28903f9a5cca351
      release: '6.1'

Changed in fuel:
assignee: nobody → Stanislaw Bogatkin (sbogatkin)
status: Confirmed → Triaged
Changed in fuel:
assignee: Stanislaw Bogatkin (sbogatkin) → Oleksiy Molchanov (omolchanov)
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

NTP is being started under vrouter namespace. The issue is not related to it. We have already a patch that extends pre-deploy time sync.

Changed in fuel:
status: Triaged → Invalid
Changed in fuel:
status: Invalid → Confirmed
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

The issue reproduced on ISO build fuel-6.1-260.

After provisioning, but before any puppet task, the following hook failed:

2015-04-01T09:05:44 info: [703] Casting message to Nailgun: {"method"=>"deploy_resp", "args"=>{"task_uuid"=>"c63f14d0-68f4-4768-8088-b340488c3a4e", "status"=>"error", "error"=>"Method granular_deploy. Failed to execute hook . --- priority: 300 fail_on_error: true type: shell uids: - '1' - '3' - '2' - '5' - '4' - '6' parameters: retries: 10 cmd: ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\\.127\\.[0-9]+\\.[0-9]+' | sed '/^#/d' | awk '{print $2}') timeout: 180 interval: 30 . Inspect Astute logs for the details"}}

The cause of issue in the following:

************ Master node NTPD peers, using the 212.111.203.242:
[root@nailgun astute]# ntpq -p -n
     remote refid st t when poll reach delay offset jitter
==============================================================================
+195.140.171.70 62.149.0.30 2 u 41 64 377 10.498 2.788 4.382
+91.236.251.30 193.190.230.65 2 u 43 64 377 18.912 3.679 5.491
*212.111.203.242 .GPS. 1 u 34 64 377 11.034 1.401 2.327

************** Slave node-1 peers:
root@node-1:~# ntpq -p
     remote refid st t when poll reach delay offset jitter
==============================================================================
 nailgun.test.do 212.111.203.242 2 u 30 64 0 0.000 0.000 0.000

************** Slave node-1 associations:
root@node-1:~# ntpq -c assoc

ind assid status conf reach auth condition last_event cnt
===========================================================
  1 18163 8011 yes no none reject mobilize 1

************ Runnung the command ntpdate:
root@node-1:/# cd / && ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+' | sed '/^#/d' | awk '{print $2}')
 1 Apr 09:24:06 ntpdate[6588]: no server suitable for synchronization found

After a few minutes (~20 min) the slave established connection to the peer and the command was executed without errors:
root@node-1:~# ntpq -p
     remote refid st t when poll reach delay offset jitter
==============================================================================
*nailgun.test.do 212.111.203.242 2 u 55 64 1 0.765 1.461 0.967

root@node-1:~# ntpq -c assoc

ind assid status conf reach auth condition last_event cnt
===========================================================
  1 18163 965a yes yes none sys.peer sys_peer 5

root@node-1:~# cd / && ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+' | sed '/^#/d' | awk '{print $2}')
 1 Apr 12:21:41 ntpdate[9073]: adjust time server 10.109.0.2 offset 0.002095 sec

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :
Changed in fuel:
assignee: Oleksiy Molchanov (omolchanov) → Stanislaw Bogatkin (sbogatkin)
Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Reproduced this bug on
fuel-6.1-260-2015-03-31_22-54-44.iso
https://oscc-jenkins-cc.vm.mirantis.net/job/deploy_cluster/155/console

Env:
1 controller + ceph
1 compute + ceph
1 ceph
Settings:
QEMU, Neutron VLAN, Ceph RDB for Cinder and Glance, no additional components.

Version:
{"build_id": "2015-03-31_22-54-44", "ostf_sha": "1c7f799cb0e159737d1f910b13e4073b16d1eb8c", "build_number": "260", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-03-31_22-54-44", "ostf_sha": "1c7f799cb0e159737d1f910b13e4073b16d1eb8c", "build_number": "260", "api": "1.0", "nailgun_sha": "51ddbe0a2e379e82af531b1c6d45fd386d2f894e", "production": "docker", "python-fuelclient_sha": "9b0b6ab2f09ff75445550b7210104e21df5d8cd6", "astute_sha": "7292fc2a673cb1c32a688a46fd4836ca0500a957", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "178812b1971a900c49a8afc1688afd7475a6ffbb", "fuellib_sha": "27459a89cdac078ec4046a442a11d750c5eccc2e"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "51ddbe0a2e379e82af531b1c6d45fd386d2f894e", "production": "docker", "python-fuelclient_sha": "9b0b6ab2f09ff75445550b7210104e21df5d8cd6", "astute_sha": "7292fc2a673cb1c32a688a46fd4836ca0500a957", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "178812b1971a900c49a8afc1688afd7475a6ffbb", "fuellib_sha": "27459a89cdac078ec4046a442a11d750c5eccc2e"}

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/171709

Revision history for this message
Sergey Kolekonov (skolekonov) wrote :

This issue was reproduced on 6.1 staging job.
http://jenkins-product.srt.mirantis.net:8080/job/6.1.staging.ubuntu.bvt_2/204/

From astute log:
2015-04-09T02:07:46 debug: [695] d1441a36-156c-41e1-93de-cf9be42115cc: cmd: cd / && ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+' | sed '/^#/d' | awk '{print $2}')
cwd: /
stdout:
stderr: 9 Apr 02:07:46 ntpdate[8491]: no server suitable for synchronization found

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/172011

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/171709
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=09ba80fd931497fca2e4b82eec3b12116341b428
Submitter: Jenkins
Branch: master

commit 09ba80fd931497fca2e4b82eec3b12116341b428
Author: Stanislaw Bogatkin <email address hidden>
Date: Wed Apr 8 18:55:54 2015 +0300

    Change NTP module

    Change options:
    * panic - panic is not boolean, it is integer, actually

    Add options:
    * tinker - disable or enable tinker
    * minpoll - to change ntpd send packages frequency
    * maxpoll - to change ntpd send packages frequency
    * stepout - to change stepout interval

    Change-Id: I8da3a129227061bd436e5343dfcceb8eb7a3aca0
    Link to upstream fix: https://tickets.puppetlabs.com/browse/MODULES-1925
    Related-bug: #1430482
    Related-bug: #1436373

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/172011
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=6652d4bb2d17d13d362797ff3ab64109ca6e9752
Submitter: Jenkins
Branch: master

commit 6652d4bb2d17d13d362797ff3ab64109ca6e9752
Author: Stanislaw Bogatkin <email address hidden>
Date: Thu Apr 9 14:14:40 2015 +0300

    Change default options for ntpd

    Start using:
    * tinker true
    * stepout 5
    * minpoll 3
    * maxpoll 9

    Closes-Bug: #1430482
    Change-Id: I8aae9a746a139198ce0b7b577683734f527bce45

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

Released.

ISO version: {"build_id": "2015-04-29_07-55-19", "build_number": "361", "release_versions": {"2014.2.2-6.1": {"VERSION": {"build_id": "2015-04-29_07-55-19", "build_number": "361", "api": "1.0", "fuel-library_sha": "0e5b82d24853304befb22145ac4aaf3545d295e1", "nailgun_sha": "e660b1c09d7d4d07bdd48d424ce9aed3b6facd6e", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "8cd6cf575d3c101dee1032abb6877dfa8487e077", "astute_sha": "04ebab96d57b0e8acbf2d7f3ba05e4fbf31b741e", "fuel-ostf_sha": "b38602c841deaa03ddffc95c02f319360462cbe3", "release": "6.1", "fuelmain_sha": "ee112acfdd0f9017ef40be53e8e51bb5c429e97c"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "0e5b82d24853304befb22145ac4aaf3545d295e1", "nailgun_sha": "e660b1c09d7d4d07bdd48d424ce9aed3b6facd6e", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "8cd6cf575d3c101dee1032abb6877dfa8487e077", "astute_sha": "04ebab96d57b0e8acbf2d7f3ba05e4fbf31b741e", "fuel-ostf_sha": "b38602c841deaa03ddffc95c02f319360462cbe3", "release": "6.1", "fuelmain_sha": "ee112acfdd0f9017ef40be53e8e51bb5c429e97c"}

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
Sergey Kolekonov (skolekonov) wrote :
Download full text (4.3 KiB)

This bug was reproduced on 5.1.2 environment (staging job)
http://jenkins-product.srt.mirantis.net:8080/job/5.1.2.staging.centos.bvt_1/137/console

AssertionError: Failed to execute "NTPD=$(find /etc/init.d/ -regex '/etc/init.d/\(ntp.?\|ntp-dev\)');$NTPD stop && ntpd -dqg && $NTPD start" on remote host:
.....................
ntpd: no servers found\n', 'ntpd: no servers found\n']}

I was also able to reproduce the issue manually on all nodes of the reverted environment

Warning: Permanently added '10.108.5.131' (RSA) to the list of known hosts.
Shutting down ntpd: [ OK ]
ntpd 4.2.6p5@1.2349-o Sat Nov 23 18:21:48 UTC 2013 (1)
11 May 08:26:50 ntpd[4245]: proto: precision = 0.300 usec
11 May 08:26:50 ntpd[4245]: 0.0.0.0 c01d 0d kern kernel time sync enabled
event at 0 0.0.0.0 c01d 0d kern kernel time sync enabled
Finished Parsing!!
11 May 08:26:50 ntpd[4245]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
11 May 08:26:50 ntpd[4245]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
11 May 08:26:50 ntpd[4245]: Listen and drop on 1 v6wildcard :: UDP 123
11 May 08:26:50 ntpd[4245]: Listen normally on 2 lo 127.0.0.1 UDP 123
restrict: op 1 addr 127.0.0.1 mask 255.255.255.255 mflags 00003000 flags 00000001
11 May 08:26:50 ntpd[4245]: Listen normally on 3 eth0 10.108.5.131 UDP 123
restrict: op 1 addr 10.108.5.131 mask 255.255.255.255 mflags 00003000 flags 00000001
11 May 08:26:50 ntpd[4245]: Listen normally on 4 lo ::1 UDP 123
restrict: op 1 addr ::1 mask ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff mflags 00003000 flags 00000001
11 May 08:26:50 ntpd[4245]: Listen normally on 5 eth4 fe80::660b:a4ff:feb6:5fb5 UDP 123
restrict: op 1 addr fe80::660b:a4ff:feb6:5fb5 mask ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff mflags 00003000 flags 00000001
11 May 08:26:50 ntpd[4245]: Listen normally on 6 eth3 fe80::66c7:bff:fecf:29cb UDP 123
restrict: op 1 addr fe80::66c7:bff:fecf:29cb mask ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff mflags 00003000 flags 00000001
11 May 08:26:50 ntpd[4245]: Listen normally on 7 eth1 fe80::660c:31ff:feda:fb07 UDP 123
restrict: op 1 addr fe80::660c:31ff:feda:fb07 mask ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff mflags 00003000 flags 00000001
11 May 08:26:50 ntpd[4245]: Listen normally on 8 eth0 fe80::664d:1cff:fecb:1879 UDP 123
restrict: op 1 addr fe80::664d:1cff:fecb:1879 mask ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff mflags 00003000 flags 00000001
11 May 08:26:50 ntpd[4245]: Listen normally on 9 eth2 fe80::66a3:aff:fef0:f77d UDP 123
restrict: op 1 addr fe80::66a3:aff:fef0:f77d mask ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff mflags 00003000 flags 00000001
11 May 08:26:50 ntpd[4245]: peers refreshed
11 May 08:26:50 ntpd[4245]: Listening on routing socket on fd #26 for interface updates
restrict: op 1 addr 0.0.0.0 mask 0.0.0.0 mflags 00000000 flags 000005d0
restrict: op 1 addr :: mask 0.0.0.0 mflags 00000000 flags 000005d0
restrict: op 1 addr :: mask :: mflags 00000000 flags 000005d0
restrict: op 1 addr 127.0.0.1 mask 255.255.255.255 mflags 00000000 flags 00000000
restrict: op 1 addr ::1 mask ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff mflags 00000000 flags 00000000
key_expire: at 0 associd 11242
peer_clear: at 0 next 1 associd 11242 refid IN...

Read more...

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 5.1.1-updates as this is deployment time issue and no new 5.1.1 deployments are expected.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.