Fuel for OpenStack

deployment failed due to ntpd server can't reach higher stratum

Bug #1435335 reported by Nikita Koshikov on 2015-03-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	High	Stanislaw Bogatkin	Fuel for OpenStack 6.1

Bug Description

Currently some providers restricts access to 123 port on root ntp servers.
If master node is placed inside such environment - then deployment will fail with error:

2015-03-23T10:45:20 err: [517] Error running RPC method granular_deploy: Failed to execute hook .

---
priority: 300
fail_on_error: true
type: shell
uids:
- '3'
- '4'
parameters:
  retries: 10
  cmd: ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+'
    | sed '/^#/d' | awk '{print $2}')
  timeout: 180
  interval: 1
, trace:
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:54:in `block in process'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `process'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine/granular_deployment.rb:201:in `pre_deployment_actions'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine.rb:32:in `deploy'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:131:in `deploy_cluster'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:56:in `granular_deploy'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/dispatcher.rb:111:in `granular_deploy'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:142:in `dispatch_message'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:103:in `block in dispatch'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `call'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `each_with_index'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `dispatch'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]

2015-03-23T10:45:20 info: [517] Casting message to Nailgun: {"method"=>"deploy_resp", "args"=>{"task_uuid"=>"b6c2fcee-1b56-4f75-a90c-87158d681da7", "status"=>"error", "error"=>"Method granular_d
eploy. Failed to execute hook .\n\n---\npriority: 300\nfail_on_error: true\ntype: shell\nuids:\n- '3'\n- '4'\nparameters:\n retries: 10\n cmd: ntpdate -u $(egrep '^server' /etc/ntp.conf | egre
p -v '127\\.127\\.[0-9]+\\.[0-9]+'\n | sed '/^#/d' | awk '{print $2}')\n timeout: 180\n interval: 1\n.\nInspect Astute logs for the details"}}

As you can see - master node can't reach root servers:
ntpq> peers
remote refid st t when poll reach delay offset jitter
==============================================================================
kahuna.ruselabs .INIT. 16 u - 256 0 0.000 0.000 0.000
422224.s.dediku .INIT. 16 u - 256 0 0.000 0.000 0.000
ponderosa.piney .INIT. 16 u - 256 0 0.000 0.000 0.000

ntpq> as
ind assid status conf reach auth condition last_event cnt
===========================================================
  1 15020 8011 yes no none reject mobilize 1
  2 15021 8011 yes no none reject mobilize 1
  3 15022 8011 yes no none reject mobilize 1

and if command ntpdate -u 'fuel_ip' executed - it produces error:
23 Mar 12:32:52 ntpdate[17115]: no server suitable for synchronization found

This can be fixed by adding settings to fuel master node - that will instruct ntpd propagate itself as synced server:
--- ntp.conf.orig 2015-03-23 13:29:32.847968972 +0000
+++ ntp.conf 2015-03-23 13:13:33.706984063 +0000
@@ -16,6 +16,8 @@
server 0.pool.ntp.org iburst
server 1.pool.ntp.org iburst
server 2.pool.ntp.org iburst
+server 127.127.1.0
+fudge 127.127.1.0 stratum 10

# Driftfile.
driftfile /var/lib/ntp/drift

After this settings added, previous command start working fine:
ntpdate -u 10.20.0.2
23 Mar 13:14:01 ntpdate[21021]: adjust time server 10.20.0.2 offset -0.000010 sec

And deployment continues...

cat /etc/fuel/version.yaml
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  api: "1.0"
  build_number: "216"
  build_id: "2015-03-22_22-54-44"
  nailgun_sha: "51974b50c3961be3ed0fdc7859570db2eeb83e9c"
  python-fuelclient_sha: "b223dcaf5fdad2f714cd245958fefe03995d6207"
  astute_sha: "4a117a1ca6bdcc34fe4d086959ace1a6d18eeca9"
  fuellib_sha: "a636c680e3c7d8cc66ed3e03645f38250beb8970"
  ostf_sha: "b4d284e9364e30bf5162975c2ba0be6ca0f14ebd"
  fuelmain_sha: "f52e4442df55a2b62637a2cf4038a24ba6f37b6f"

Tags:

Ryan Moe (rmoe) on 2015-03-23

Changed in fuel:
status:	New → Triaged
importance:	Undecided → High
assignee:	nobody → Fuel Library Team (fuel-library)
milestone:	none → 6.1

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-03-24:

There is actually nothing that we can do. If we will add local undisciplined clock to our master node, it will lead to some unpredictable errors too (and it was - as you can see, we had this code just a month or two ago). Case when we need upstream servers available and should not add local clock to master node is just a less evil.

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-03-24:

As I see, we can switch from ntpd to something else on master node, but we don't have so much time to thoroughly test this in 6.1. So, I suggest postpone this to 7.0.

Changed in fuel:
milestone:	6.1 → 7.0

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-03-24:

@Stanislaw, we have to adress this issue at least as a workaround described in the documentation. Although, the given failure message looks not informative, it should provide more details and be more specific. And if it is possible, we should make the deployment to not fail on this issue

tags:	added: low-hanging-fruit
Changed in fuel:
milestone:	7.0 → 6.1

Stanislaw Bogatkin (sbogatkin) on 2015-03-24

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Stanislaw Bogatkin (sbogatkin)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-24: Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/167149

Changed in fuel:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-25: Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/167149
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=4b622b8d7f0570ceb008e6eccdbb3ab26c4cf87c
Submitter: Jenkins
Branch: master

commit 4b622b8d7f0570ceb008e6eccdbb3ab26c4cf87c
Author: Stanislaw Bogatkin <email address hidden>
Date: Tue Mar 24 12:46:12 2015 +0300

Add check of upstream ntp server

If we can't reach upstream server, tell our ntpd to give time from
local clock to recipients.

Change-Id: Ia9130259a9112017f0b9362ff4505425dfc82008
Closes-Bug: #1435335

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

Kyrylo Romanenko (kromanenko) wrote on 2015-03-30:

fuel-snapshot-2015-03-30_10-15-22.tar.xz Edit (5.1 MiB, application/octet-stream)

Attempted to deploy Juno on CentOS 6.5.
Cluster: 1 compute, 1 controller, 1 cinder.
Neutron VLAN networking, QEMU, LVM.

This error still sustains in MOS 6.1 build.

ERR
[632] Error running RPC method granular_deploy: Failed to execute hook .
---
priority: 300
fail_on_error: true
type: shell
uids:
- '1'
- '3'
- '2'
parameters:
  retries: 10
  cmd: ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+'
    | sed '/^#/d' | awk '{print $2}')
  timeout: 180
  interval: 30
, trace:
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:54:in `block in process'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `process'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine/granular_deployment.rb:201:in `pre_deployment_actions'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine.rb:32:in `deploy'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:133:in `deploy_cluster'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:56:in `granular_deploy'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/dispatcher.rb:111:in `granular_deploy'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:142:in `dispatch_message'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:103:in `block in dispatch'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `call'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `each_with_index'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `dispatch'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  api: "1.0"
  build_number: "248"
  build_id: "2015-03-30_03-08-59"
  nailgun_sha: "a3c259a4875787274fa01f0eba6514cc01b34308"
  python-fuelclient_sha: "05ec53f94206decdce19bb9373523022e5616b83"
  astute_sha: "f595715750a2c4820722a96e0236f5c89ca6521c"
  fuellib_sha: "3c85c9f16541c6ef461eb93816db51f798aba90c"
  ostf_sha: "e59c905566ed701117d7c643b435b13e6b5f8c3b"
  fuelmain_sha: "320b5f46fc1b2798f9e86ed7df51d3bda1686c10"

Attempted to deploy Juno on CentOS 6.5. 
Cluster: 1 compute, 1 controller, 1 cinder. 
Neutron VLAN networking, QEMU, LVM.

This error still sustains in MOS 6.1 build.

ERR	
[632] Error running RPC method granular_deploy: Failed to execute hook .
---
priority: 300
fail_on_error: true
type: shell
uids:
- '1'
- '3'
- '2'
parameters:
  retries: 10
  cmd: ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+'
    | sed '/^#/d' | awk '{print $2}')
  timeout: 180
  interval: 30
, trace: 
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:54:in `block in process'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `process'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine/granular_deployment.rb:201:in `pre_deployment_actions'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine.rb:32:in `deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:133:in `deploy_cluster'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:56:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/dispatcher.rb:111:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:142:in `dispatch_message'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:103:in `block in dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `each_with_index'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]

Changed in fuel:
status:	Fix Committed → Confirmed

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-04-02:

Kyrylo, w/o ntpq -p output we cannot understand is it problem from description or there is something else. Posthook fail is just too much general. But I'll create one more patch that should fix that problem.

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-04-02:

Kyrylo, seems that we already have bug about your problem [0], so I close this one.

[0] https://bugs.launchpad.net/fuel/+bug/1430482

Changed in fuel:
status:	Confirmed → Fix Committed

Revision history for this message

Dennis Dmitriev (ddmitriev) wrote on 2015-05-21:

Fix released: http://paste.openstack.org/show/230334/
But issue can be reproduced during a cluster deploying if there is no connectivity to upstream NTP servers through public network, in this case the cluster should be configured to use Fuel admin node as NTP server.

[root@nailgun ~]# fuel --fuel-version
DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
api: '1.0'
astute_sha: b09729c64b695b2e6fcc88c31843321759ec45d5
auth_required: true
build_id: 2015-05-16_21-44-48
build_number: '426'
feature_groups:
- mirantis
fuel-library_sha: cc8ca7035be9b01d61f6fc6167d7e5d82a4fe1bc
fuel-ostf_sha: 9ce1800749081780b8b2a4a7eab6586583ffaf33
fuelmain_sha: 0e970647a83d9a7d336c4cc253606d4dd0d59a60
nailgun_sha: 076566b5df37f681c3fd5b139c966d680d81e0a5
openstack_version: 2014.2.2-6.1
production: docker
python-fuelclient_sha: 38765563e1a7f14f45201fd47cf507393ff5d673
release: '6.1'
release_versions:
  2014.2.2-6.1:
    VERSION:
      api: '1.0'
      astute_sha: b09729c64b695b2e6fcc88c31843321759ec45d5
      build_id: 2015-05-16_21-44-48
      build_number: '426'
      feature_groups:
      - mirantis
      fuel-library_sha: cc8ca7035be9b01d61f6fc6167d7e5d82a4fe1bc
      fuel-ostf_sha: 9ce1800749081780b8b2a4a7eab6586583ffaf33
      fuelmain_sha: 0e970647a83d9a7d336c4cc253606d4dd0d59a60
      nailgun_sha: 076566b5df37f681c3fd5b139c966d680d81e0a5
      openstack_version: 2014.2.2-6.1
      production: docker
      python-fuelclient_sha: 38765563e1a7f14f45201fd47cf507393ff5d673
      release: '6.1'

Changed in fuel:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

fuel-snapshot-2015-03-30_10-15-22.tar.xz Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.