deployment failed due to ntpd server can't reach higher stratum

Bug #1435335 reported by Nikita Koshikov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Stanislaw Bogatkin

Bug Description

Currently some providers restricts access to 123 port on root ntp servers.
If master node is placed inside such environment - then deployment will fail with error:

2015-03-23T10:45:20 err: [517] Error running RPC method granular_deploy: Failed to execute hook .

---
priority: 300
fail_on_error: true
type: shell
uids:
- '3'
- '4'
parameters:
  retries: 10
  cmd: ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+'
    | sed '/^#/d' | awk '{print $2}')
  timeout: 180
  interval: 1
, trace:
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:54:in `block in process'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `process'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine/granular_deployment.rb:201:in `pre_deployment_actions'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine.rb:32:in `deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:131:in `deploy_cluster'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:56:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/dispatcher.rb:111:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:142:in `dispatch_message'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:103:in `block in dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `each_with_index'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]

2015-03-23T10:45:20 info: [517] Casting message to Nailgun: {"method"=>"deploy_resp", "args"=>{"task_uuid"=>"b6c2fcee-1b56-4f75-a90c-87158d681da7", "status"=>"error", "error"=>"Method granular_d
eploy. Failed to execute hook .\n\n---\npriority: 300\nfail_on_error: true\ntype: shell\nuids:\n- '3'\n- '4'\nparameters:\n retries: 10\n cmd: ntpdate -u $(egrep '^server' /etc/ntp.conf | egre
p -v '127\\.127\\.[0-9]+\\.[0-9]+'\n | sed '/^#/d' | awk '{print $2}')\n timeout: 180\n interval: 1\n.\nInspect Astute logs for the details"}}

As you can see - master node can't reach root servers:
ntpq> peers
     remote refid st t when poll reach delay offset jitter
==============================================================================
 kahuna.ruselabs .INIT. 16 u - 256 0 0.000 0.000 0.000
 422224.s.dediku .INIT. 16 u - 256 0 0.000 0.000 0.000
 ponderosa.piney .INIT. 16 u - 256 0 0.000 0.000 0.000

ntpq> as
ind assid status conf reach auth condition last_event cnt
===========================================================
  1 15020 8011 yes no none reject mobilize 1
  2 15021 8011 yes no none reject mobilize 1
  3 15022 8011 yes no none reject mobilize 1

and if command ntpdate -u 'fuel_ip' executed - it produces error:
23 Mar 12:32:52 ntpdate[17115]: no server suitable for synchronization found

This can be fixed by adding settings to fuel master node - that will instruct ntpd propagate itself as synced server:
--- ntp.conf.orig 2015-03-23 13:29:32.847968972 +0000
+++ ntp.conf 2015-03-23 13:13:33.706984063 +0000
@@ -16,6 +16,8 @@
 server 0.pool.ntp.org iburst
 server 1.pool.ntp.org iburst
 server 2.pool.ntp.org iburst
+server 127.127.1.0
+fudge 127.127.1.0 stratum 10

 # Driftfile.
 driftfile /var/lib/ntp/drift

After this settings added, previous command start working fine:
ntpdate -u 10.20.0.2
23 Mar 13:14:01 ntpdate[21021]: adjust time server 10.20.0.2 offset -0.000010 sec

And deployment continues...

cat /etc/fuel/version.yaml
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  api: "1.0"
  build_number: "216"
  build_id: "2015-03-22_22-54-44"
  nailgun_sha: "51974b50c3961be3ed0fdc7859570db2eeb83e9c"
  python-fuelclient_sha: "b223dcaf5fdad2f714cd245958fefe03995d6207"
  astute_sha: "4a117a1ca6bdcc34fe4d086959ace1a6d18eeca9"
  fuellib_sha: "a636c680e3c7d8cc66ed3e03645f38250beb8970"
  ostf_sha: "b4d284e9364e30bf5162975c2ba0be6ca0f14ebd"
  fuelmain_sha: "f52e4442df55a2b62637a2cf4038a24ba6f37b6f"

Ryan Moe (rmoe)
Changed in fuel:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Fuel Library Team (fuel-library)
milestone: none → 6.1
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

There is actually nothing that we can do. If we will add local undisciplined clock to our master node, it will lead to some unpredictable errors too (and it was - as you can see, we had this code just a month or two ago). Case when we need upstream servers available and should not add local clock to master node is just a less evil.

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

As I see, we can switch from ntpd to something else on master node, but we don't have so much time to thoroughly test this in 6.1. So, I suggest postpone this to 7.0.

Changed in fuel:
milestone: 6.1 → 7.0
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

@Stanislaw, we have to adress this issue at least as a workaround described in the documentation. Although, the given failure message looks not informative, it should provide more details and be more specific. And if it is possible, we should make the deployment to not fail on this issue

tags: added: low-hanging-fruit
Changed in fuel:
milestone: 7.0 → 6.1
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Stanislaw Bogatkin (sbogatkin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/167149

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/167149
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=4b622b8d7f0570ceb008e6eccdbb3ab26c4cf87c
Submitter: Jenkins
Branch: master

commit 4b622b8d7f0570ceb008e6eccdbb3ab26c4cf87c
Author: Stanislaw Bogatkin <email address hidden>
Date: Tue Mar 24 12:46:12 2015 +0300

    Add check of upstream ntp server

    If we can't reach upstream server, tell our ntpd to give time from
    local clock to recipients.

    Change-Id: Ia9130259a9112017f0b9362ff4505425dfc82008
    Closes-Bug: #1435335

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Attempted to deploy Juno on CentOS 6.5.
Cluster: 1 compute, 1 controller, 1 cinder.
Neutron VLAN networking, QEMU, LVM.

This error still sustains in MOS 6.1 build.

ERR
[632] Error running RPC method granular_deploy: Failed to execute hook .
---
priority: 300
fail_on_error: true
type: shell
uids:
- '1'
- '3'
- '2'
parameters:
  retries: 10
  cmd: ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+'
    | sed '/^#/d' | awk '{print $2}')
  timeout: 180
  interval: 30
, trace:
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:54:in `block in process'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `process'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine/granular_deployment.rb:201:in `pre_deployment_actions'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine.rb:32:in `deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:133:in `deploy_cluster'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:56:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/dispatcher.rb:111:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:142:in `dispatch_message'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:103:in `block in dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `each_with_index'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  api: "1.0"
  build_number: "248"
  build_id: "2015-03-30_03-08-59"
  nailgun_sha: "a3c259a4875787274fa01f0eba6514cc01b34308"
  python-fuelclient_sha: "05ec53f94206decdce19bb9373523022e5616b83"
  astute_sha: "f595715750a2c4820722a96e0236f5c89ca6521c"
  fuellib_sha: "3c85c9f16541c6ef461eb93816db51f798aba90c"
  ostf_sha: "e59c905566ed701117d7c643b435b13e6b5f8c3b"
  fuelmain_sha: "320b5f46fc1b2798f9e86ed7df51d3bda1686c10"

Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Kyrylo, w/o ntpq -p output we cannot understand is it problem from description or there is something else. Posthook fail is just too much general. But I'll create one more patch that should fix that problem.

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Kyrylo, seems that we already have bug about your problem [0], so I close this one.

[0] https://bugs.launchpad.net/fuel/+bug/1430482

Changed in fuel:
status: Confirmed → Fix Committed
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

Fix released: http://paste.openstack.org/show/230334/
But issue can be reproduced during a cluster deploying if there is no connectivity to upstream NTP servers through public network, in this case the cluster should be configured to use Fuel admin node as NTP server.

[root@nailgun ~]# fuel --fuel-version
DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
api: '1.0'
astute_sha: b09729c64b695b2e6fcc88c31843321759ec45d5
auth_required: true
build_id: 2015-05-16_21-44-48
build_number: '426'
feature_groups:
- mirantis
fuel-library_sha: cc8ca7035be9b01d61f6fc6167d7e5d82a4fe1bc
fuel-ostf_sha: 9ce1800749081780b8b2a4a7eab6586583ffaf33
fuelmain_sha: 0e970647a83d9a7d336c4cc253606d4dd0d59a60
nailgun_sha: 076566b5df37f681c3fd5b139c966d680d81e0a5
openstack_version: 2014.2.2-6.1
production: docker
python-fuelclient_sha: 38765563e1a7f14f45201fd47cf507393ff5d673
release: '6.1'
release_versions:
  2014.2.2-6.1:
    VERSION:
      api: '1.0'
      astute_sha: b09729c64b695b2e6fcc88c31843321759ec45d5
      build_id: 2015-05-16_21-44-48
      build_number: '426'
      feature_groups:
      - mirantis
      fuel-library_sha: cc8ca7035be9b01d61f6fc6167d7e5d82a4fe1bc
      fuel-ostf_sha: 9ce1800749081780b8b2a4a7eab6586583ffaf33
      fuelmain_sha: 0e970647a83d9a7d336c4cc253606d4dd0d59a60
      nailgun_sha: 076566b5df37f681c3fd5b139c966d680d81e0a5
      openstack_version: 2014.2.2-6.1
      production: docker
      python-fuelclient_sha: 38765563e1a7f14f45201fd47cf507393ff5d673
      release: '6.1'

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.