Deployment failed due to inability to sync time in pre-hook task

Bug #1421965 reported by Yaroslav Lobankov
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Stanislaw Bogatkin
6.0.x
Invalid
High
Stanislaw Bogatkin

Bug Description

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  api: "1.0"
  build_number: "123"
  build_id: "2015-02-14_09-55-08"
  nailgun_sha: "1e3a40dd8a17abe1d38f42da1e0dc1a6d4572666"
  python-fuelclient_sha: "61431ed16fc00039a269424bdbaa410277eff609"
  astute_sha: "1f87a9b9a47de7498b4061d15a8c7fb9435709d5"
  fuellib_sha: "7f8d4382abfcd4338964182ebfea1d539f963e66"
  ostf_sha: "f9c37d0876141e1550eb4e703a8e500cd463282f"
  fuelmain_sha: "2054229e275d08898b5d079a6625ffcc79ae23b8"

ENVIRONMENT:
HA mode, CentOS, Neutron with GRE segmentation, Cinder LVM, Sahara and Ceilometer are enabled, 3 (controller + mongo), 1 (compute + cinder)

Deployment failed with the following error:
Deployment has failed. Method granular_deploy. undefined method `[]' for nil:NilClass.
Inspect Astute logs for the details

In Astute logs I see these errors:
2015-02-14 14:58:34 ERR
[403] Error running RPC method granular_deploy: undefined method `[]' for nil:NilClass, trace:
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:159:in `calculate_multiroles_node_progress'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:96:in `node_validate'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:78:in `block in get_nodes_to_report'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:78:in `map'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:78:in `get_nodes_to_report'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:51:in `report_new_data'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:44:in `report'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/context.rb:35:in `report_and_update_status'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:49:in `block in process'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `process'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine/granular_deployment.rb:201:in `pre_deployment_actions'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine.rb:32:in `deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:312:in `deploy_cluster'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:63:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/dispatcher.rb:99:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:142:in `dispatch_message'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:103:in `block in dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `each_with_index'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]
2015-02-14 14:58:34 ERR
[403] Unexpected error undefined method `[]' for nil:NilClass traceback
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:159:in `calculate_multiroles_node_progress'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:96:in `node_validate'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:78:in `block in get_nodes_to_report'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:78:in `map'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:78:in `get_nodes_to_report'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:51:in `report_new_data'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:44:in `report'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/context.rb:35:in `report_and_update_status'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:49:in `block in process'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `process'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine/granular_deployment.rb:201:in `pre_deployment_actions'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine.rb:32:in `deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:312:in `deploy_cluster'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:63:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/dispatcher.rb:99:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:142:in `dispatch_message'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:103:in `block in dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `each_with_index'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]

Diagnostic snapshot is attached.

Revision history for this message
Yaroslav Lobankov (ylobankov) wrote :
summary: - Deployment fails with error "Method granular_deploy. undefined method
+ Deployment failed with error "Method granular_deploy. undefined method
`[]' for nil:NilClass."
description: updated
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote : Re: Deployment failed with error "Method granular_deploy. undefined method `[]' for nil:NilClass."

Seems to be the same problem as I had, there is no default route, so ntp failed to sync time with external server.

Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
importance: Undecided → Critical
status: New → Triaged
milestone: none → 6.1
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

This bug contains no patch, solution or w/a described, setting its status to confirmed

Changed in fuel:
status: Triaged → Confirmed
Revision history for this message
Oleksiy Butenko (obutenko) wrote :

I'am reproduced this bug on:
Centos, Neutron with GRE segmentation, 3 controller + 1 computer + 1 cinder
Snapshot attached.
But when I deploy configuration with: Centos, Neutron with GRE segmentation, 1 controller + 1 computer + 1 cinder - install finished correctly.
If use Ubuntu - all env install correctly.
Perhaps bug associated with HA mode and centos

api: '1.0'
astute_sha: 1f87a9b9a47de7498b4061d15a8c7fb9435709d5
auth_required: true
build_id: 2015-02-15_22-54-44
build_number: '126'
feature_groups:
mirantis
fuellib_sha: 7f8d4382abfcd4338964182ebfea1d539f963e66
fuelmain_sha: 2054229e275d08898b5d079a6625ffcc79ae23b8
nailgun_sha: 1e3a40dd8a17abe1d38f42da1e0dc1a6d4572666
ostf_sha: f9c37d0876141e1550eb4e703a8e500cd463282f
production: docker
python-fuelclient_sha: 61431ed16fc00039a269424bdbaa410277eff609
release: '6.1'
release_versions:
2014.2-6.1:
VERSION:
api: '1.0'
astute_sha: 1f87a9b9a47de7498b4061d15a8c7fb9435709d5
build_id: 2015-02-15_22-54-44
build_number: '126'
feature_groups:
mirantis
fuellib_sha: 7f8d4382abfcd4338964182ebfea1d539f963e66
fuelmain_sha: 2054229e275d08898b5d079a6625ffcc79ae23b8
nailgun_sha: 1e3a40dd8a17abe1d38f42da1e0dc1a6d4572666
ostf_sha: f9c37d0876141e1550eb4e703a8e500cd463282f
production: docker
python-fuelclient_sha: 61431ed16fc00039a269424bdbaa410277eff609
release: '6.1'

Revision history for this message
Oleksiy Butenko (obutenko) wrote :

Update: I use Neutron with VLAN, not with GRE

Revision history for this message
Oleksiy Butenko (obutenko) wrote :

I'am reproduced this bug with ubuntu.
ENVIRONMENT:
HA mode, UBUNTU, NOVA network, Cinder LVM, Ceilometer are enabled, 3 (controller + mongo), 2 (compute + cinder)

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Stanislaw Bogatkin (sbogatkin)
Changed in fuel:
status: Confirmed → In Progress
Igor Belikov (ibelikov)
Changed in fuel:
status: In Progress → Confirmed
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Also reproduced that on
{"build_id": "2015-02-12_10-39-11", "ostf_sha": "f9c37d0876141e1550eb4e703a8e500cd463282f", "build_number": "115", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-02-12_10-39-11", "ostf_sha": "f9c37d0876141e1550eb4e703a8e500cd463282f", "build_number": "115", "api": "1.0", "nailgun_sha": "78e1fa50e38efc1001ddfe0565a55e9e176ff5f6", "production": "docker", "python-fuelclient_sha": "61431ed16fc00039a269424bdbaa410277eff609", "astute_sha": "2159855ba7b82956ac0787a4e7be053105c4c1f1", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "892c80aa9adc9f53e9b3061d4754203953a84db7", "fuellib_sha": "592df3ba1dbfba6c6c84f90ff36b0c4c697934d3"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "78e1fa50e38efc1001ddfe0565a55e9e176ff5f6", "production": "docker", "python-fuelclient_sha": "61431ed16fc00039a269424bdbaa410277eff609", "astute_sha": "2159855ba7b82956ac0787a4e7be053105c4c1f1", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "892c80aa9adc9f53e9b3061d4754203953a84db7", "fuellib_sha": "592df3ba1dbfba6c6c84f90ff36b0c4c697934d3"}

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/156251
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=bd74e318fe83d3e522630f77ae3a65a4b5777ba6
Submitter: Jenkins
Branch: master

commit bd74e318fe83d3e522630f77ae3a65a4b5777ba6
Author: Stanislaw Bogatkin <email address hidden>
Date: Mon Feb 16 18:22:58 2015 +0300

    Enable iburst mode and udlc in ntp module

    Change-Id: I8b28b5d1c7a0a80ea5fa25648f40673011675864
    Closes-Bug: #1298360
    Closes-Bug: #1421289
    Closes-Bug: #1421965

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Oleksiy Butenko (obutenko) wrote : Re: Deployment failed with error "Method granular_deploy. undefined method `[]' for nil:NilClass."

Also reproduced that on mos 6.1 build 114

But doesn't reproduced that on mos 6.1 build 105 & 6.0.1 build 92
{"build_id": "2015-02-07_22-55-01", "ostf_sha": "6c046b69d29021524906109f18092363505ee222", "build_number": "105", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-02-07_22-55-01", "ostf_sha": "6c046b69d29021524906109f18092363505ee222", "build_number": "105", "api": "1.0", "nailgun_sha": "6d1769b21819f8fb4195f1bd9c44c038721ae3d4", "production": "docker", "python-fuelclient_sha": "521c2491f7f04f31d8c85db68499cd193d4904e3", "astute_sha": "7e6e6f9188bd69c603853b10d4a55149363323cc", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "", "fuellib_sha": "769af7fe30225cd15638ea2e6dffaa286bc06da1"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "6d1769b21819f8fb4195f1bd9c44c038721ae3d4", "production": "docker", "python-fuelclient_sha": "521c2491f7f04f31d8c85db68499cd193d4904e3", "astute_sha": "7e6e6f9188bd69c603853b10d4a55149363323cc", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "", "fuellib_sha": "769af7fe30225cd15638ea2e6dffaa286bc06da1"}

{"build_id": "2015-02-15_20-49-44", "ostf_sha": "3b57985d4d2155510894a1f6d03b478b201f7780", "build_number": "92", "auth_required": true, "api": "1.0", "nailgun_sha": "6967b24adc4d74e36d69b59973ff79d6ab2389e5", "production": "docker", "fuelmain_sha": "c799e3a6d88289e58db764a6be7910aab7da3149", "astute_sha": "f7cda2171b0b677dfaeb59693d980a2d3ee4c3e0", "feature_groups": ["mirantis"], "release": "6.0.1", "release_versions": {"2014.2-6.0.1": {"VERSION": {"build_id": "2015-02-15_20-49-44", "ostf_sha": "3b57985d4d2155510894a1f6d03b478b201f7780", "build_number": "92", "api": "1.0", "nailgun_sha": "6967b24adc4d74e36d69b59973ff79d6ab2389e5", "production": "docker", "fuelmain_sha": "c799e3a6d88289e58db764a6be7910aab7da3149", "astute_sha": "f7cda2171b0b677dfaeb59693d980a2d3ee4c3e0", "feature_groups": ["mirantis"], "release": "6.0.1", "fuellib_sha": "c05541e54f2a127f9c38f7ddb6abf30584fc55e4"}}}, "fuellib_sha": "c05541e54f2a127f9c38f7ddb6abf30584fc55e4"}

Revision history for this message
Vasyl Saienko (vsaienko) wrote :

I reproduced this bug with ubuntu
ENVIRONMENT:
HA mode, UBUNTU, Neutron, Cinder LVM
3 Controllers
16 Compute + cinder

Revision history for this message
Vasyl Saienko (vsaienko) wrote :

api: '1.0'
astute_sha: 8af8e88c3cb17b66368e7a038f1899e5c7c13e98
auth_required: true
build_id: 2015-02-17_22-54-44
build_number: '129'
feature_groups:
- mirantis
fuellib_sha: 5956f0be54600ffad96ef50eb2ba91307d3664d9
fuelmain_sha: 1c249116ca4285be28ce78be27cb5a70f0bf2fb8
nailgun_sha: 9b5b8faac0cac75507adb0958c96ade9285525cb
ostf_sha: 66f58a43c30b98c5a4f7cf040712ca7a588ea761
production: docker
python-fuelclient_sha: c4f8f7cf81c7af55681f8b85c03e179975075e73
release: '6.1'
release_versions:
  2014.2-6.1:
    VERSION:
      api: '1.0'
      astute_sha: 8af8e88c3cb17b66368e7a038f1899e5c7c13e98
      build_id: 2015-02-17_22-54-44
      build_number: '129'
      feature_groups:
      - mirantis
      fuellib_sha: 5956f0be54600ffad96ef50eb2ba91307d3664d9
      fuelmain_sha: 1c249116ca4285be28ce78be27cb5a70f0bf2fb8
      nailgun_sha: 9b5b8faac0cac75507adb0958c96ade9285525cb
      ostf_sha: 66f58a43c30b98c5a4f7cf040712ca7a588ea761
      production: docker
      python-fuelclient_sha: c4f8f7cf81c7af55681f8b85c03e179975075e73
      release: '6.1'

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

We don't have such bug in releases below 6.1 due to bug root lay in new ntp package and reconfiguration that follow. In 6.0 we don't have such settings that we have in 6.1. Closed.

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Just caught this bug on following build:

{"build_id": "2015-02-23_22-54-44", "ostf_sha": "1a0b2c6618fac098473c2ed5a9af11d3a886a3bb", "build_number": "140", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-02-23_22-54-44", "ostf_sha": "1a0b2c6618fac098473c2ed5a9af11d3a886a3bb", "build_number": "140", "api": "1.0", "nailgun_sha": "3616ae9df4ac3e088157bb94f73743a521f76f1a", "production": "docker", "python-fuelclient_sha": "5657dbf06fddb74adb61e9668eb579a1c57d8af8", "astute_sha": "d81ff53c2f467151ecde120d3a4d284e3b5b3dfc", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "b975019fabdb429c1869047df18dd792d2163ecc", "fuellib_sha": "8b79d47ef41bff293210d2a7b1bb02843f70948d"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "3616ae9df4ac3e088157bb94f73743a521f76f1a", "production": "docker", "python-fuelclient_sha": "5657dbf06fddb74adb61e9668eb579a1c57d8af8", "astute_sha": "d81ff53c2f467151ecde120d3a4d284e3b5b3dfc", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "b975019fabdb429c1869047df18dd792d2163ecc", "fuellib_sha": "8b79d47ef41bff293210d2a7b1bb02843f70948d"}

Nodes:
1) Controller + CephOSD
2) Compute +CephOSD
3) CephOSD

Juno on Ubuntu 12.04.4 (2014.2-6.1)
Multi-node with HA
Neutron with VLAN segmentation
QEMU

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Kyrylo, can you add a diagnostic snapshot, please? Root cause of TS was incorrect ntpdate, it can be seen in astute log on master node. But I cannot reopen bug while will not see such log from newer ISO.

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Reopen.

Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
Victor Ryzhenkin (vryzhenkin) wrote :

Reproduced one more time on MOS 6.1 #140 CentOS 6.5 with All Ceph, Murano and Neutron GRE.

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

So, deeper investigation show why it happens:
For reproduce you should do next steps:

1. Deploy master node with system tests (it will provision master, deploy it and start ntpd on it). On the end of this action virtual machine will stopped.
2. Wait a little (in my case I wait 20 minutes, maybe a little less).
3. Resume master VM.

Now we have situation when remote servers give to ntpd new time (on 20 minutes more that it have). NTP protocol wasn't created to handle such situation (http://www.ntp.org/ntpfaq/NTP-s-algo.htm#Q-ALGO-BASIC-STEP-SLEW) - it will say that jitter and offset are huge. For that case we have local clock source - and master start give local time to bootstraps.

4. Start bootstrap nodes.
They can get local time or get nothing - it's not so critical as can be seem

5. Set roles and start deploy. It will start provisioning and deploy.
In that time our master node already will understand that remote servers give her right time - and jitter will subside. Now we have good remote servers with huge offset. Our master will switch from local server to remote servers (but she will not give time to other nodes anymore due to huge offset between remote servers and local time). How long we should wait to get that situation? In my case it was about 10 minutes.

6. Provisioning stage ends. Predeploy starts. There we have task to sync time between nodes and master. As you can understand, it will fail due to huge offset on master node.

So, actually, it is not bug. It just how NTP protocol works. We know about it and restart ntpd when revert snapshots in system tests - it's why we don't see that error on CI. But we cannot restart ntpd when someone push 'unpause' in virt-manager or write 'virsh resume <id>' in console.

I see next options to handle this situation:
1. Do not forget manually restart ntpdate and ntpd on master node after running 'setup' group in tests (maybe some other, I just use 'setup')
2. Stop pausing master VM on end of deploy. Or just leave it alone as it is, or shutdown it after test is done.

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

As long as this bug happens only in custom deploys, only in VMs, only when you manually resuming VMs, only in some cases after all of that, lowering to high.

Changed in fuel:
importance: Critical → High
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

After speaking with QA team, I can say next things:

1. If you do custom deploy - please, resume your VMs with dos.py. It has 'resume' option which already do ntpd restart automatically.
2. If you can't or don't want to do resume with dos.py - please, do next steps after master node resume (and before bootstraps start):
 -- stop ntpd (do /etc/init.d/ntpd stop)
 -- restart ntpdate (do /etc/init.d/ntpd restart)
 -- start ntpd (do /etc/init.d/ntpd start)

As long as it is not bug but just usual ntpd behavior, I close this bug.

Changed in fuel:
status: Confirmed → Won't Fix
status: Won't Fix → Invalid
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

This issue reproduced on my environment with VirtualBox, 1 controller, 1 compute, Ubuntu, Neutron VLAN.
I can see that Ubuntu was successfully installed and after that deployment failed.
We tried to redeploy the environment via CLI, but it fails 3 times and only after additional manual manipulation it was fixed (Sergey Kolekonov fixed this issue on my environment somehow)

So, looks like this issue can be reproduced on the customers environments and it will be the real pain if we will fail deployment because of ntp daemon doesn't work correctly. We need to fix this behaviour to avoid such fails on the production/qa/demo environments.

I can help to reproduce the issue, if it is required.

Changed in fuel:
status: Invalid → Confirmed
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Are you really read what I written before? It is NOT a bug, it is just how NTP works. If you want to successful deploy - restart ntpd manually or resume machines with dos.py. You cannot do anything more than restart ntpd on master node. If someone know any other method - I will glad to hear that.

Changed in fuel:
status: Confirmed → Invalid
Revision history for this message
Chang-Yi Lee (cy-lee) wrote :

We encounter same problem and we are deploy to bare-metal machines

Dell R620 X 3 (controller)
Dell R720xd X 3 (compute)

{"ostf_sha": "1a0b2c6618fac098473c2ed5a9af11d3a886a3bb", "release_versions": {"2014.2-6.1": {"VERSION": {"ostf_sha": "1a0b2c6618fac098473c2ed5a9af11d3a886a3bb", "api": "1.0", "nailgun_sha": "54b0302665b7c02009b89f693170c26437276e76", "production": "docker", "python-fuelclient_sha": "5657dbf06fddb74adb61e9668eb579a1c57d8af8", "astute_sha": "6d6ad68e0cde286d74ac7d52e21da4fc8dcbe9ab", "feature_groups": ["experimental"], "release": "6.1", "fuelmain_sha": "87223c15bb734bb668050850a6b43aa4291349b7", "fuellib_sha": "8384b8ca4db84794fb21e287202f05e31f78841c"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "54b0302665b7c02009b89f693170c26437276e76", "production": "docker", "python-fuelclient_sha": "5657dbf06fddb74adb61e9668eb579a1c57d8af8", "astute_sha": "6d6ad68e0cde286d74ac7d52e21da4fc8dcbe9ab", "feature_groups": ["experimental"], "release": "6.1", "fuelmain_sha": "87223c15bb734bb668050850a6b43aa4291349b7", "fuellib_sha": "8384b8ca4db84794fb21e287202f05e31f78841c"}

OS:
Ubuntu 12.04

Reference architecture:
HA

Network model:
Neutron+VLAN

Related Projects installed:
Ceilometer

Nodes:

Controller + mongodb * 3
Compute + Ceph-OSD * 3

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Hi, Chang-Yi Lee.
Actually, your problem isn't related to this bug. You have next lines in astute.yaml:

/etc/puppet/modules/osnailyfacter/modular/astute/upload_cirros.rb:65:in `wait_for_glance': Could not get a list of glance images! (RuntimeError)
 from /etc/puppet/modules/osnailyfacter/modular/astute/upload_cirros.rb:89

So, your deployment was broke due to another problem, not related to ntpd. I woul appreciate if you file a new bug about it.

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Sorry for mistake. It is lines in astute.log, not yaml.

summary: - Deployment failed with error "Method granular_deploy. undefined method
- `[]' for nil:NilClass."
+ Deployment failed due to inability to sync time in pre-hook task
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Stanislaw, could we add restart of ntpd daemon before each deployment?

Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :

Issue is still reproduced, iso#128.
Could not find data item controller_internal_addresses in any Hiera data file and no default supplied at /etc/puppet/modules/osnailyfacter/modular/cluster/cluster.pp:5 on node node-1.test.domain.local

Changed in fuel:
status: Invalid → Confirmed
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Ivan, can you tell me, how you reproduce this bug, what you did? And attach diagnostic snapshot, please.
I reminder - bug will be closed, if you not restart ntpd on master after manual resume before deployment.

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Timur - we can, but it's not right, as I think. You shouldn't suspend your VM and expect that it will be work ok after resume. You just should use dos.py for revert - it is right way.

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Stanislaw, I don't use fuel-devops scripts for OpenStack deployments, it is not fuel master node from snapshot - it is fresh installation - so, it is not clear why fresh installation doesn't work from the box.

So, the steps to reproduce are the following:
1. Take new ISO MOS 6.1
2. Start to deploy environment with VirtualBox or KVM.
3. Configure OpenStack cloud and start to deploy it.

After that we can see this error - it is looks like 'bad feature', so, it worked fine previously and only after the nptd re-factoring it started to fail... strange!

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Timur, I cannot reproduce this bug if I don't pause master node VM. And noone from this thread didn't too. Am I right that you just install new master node from scratch, never pause or suspend it, then tried to deploy some env and this env is failed? If so - could you contact me, please to see on your env, cause, as I said, I cannot reproduce this bug in such clause.

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

@Stanislaw I've looked at the Ivan's env, he has other issue not related to this report @Ivan could you confirm that

Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :

@Tatyanka, @Stanislaw, thanks for help. My issue was with multinode mode, not with NTP. HA works for me

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Reproduced on my environment with 3 controllers, snapshot is attached.

Steps To Reproduce:
1. Deploy env with Ubuntu, 1 controller and 1 compute. Deployment will be finished successfully.
2. Create cluster with CentOS, 3 controllers and 0 computes, start deployment

Deployment will fail.

Snapshot:
https://copy.com/ny9HOqXfexxxQpKH

Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :
Download full text (4.2 KiB)

Hello again, Timur.
That's what I see in astute.log file from your last snapshot:

2015-03-03T09:33:37 info: [421] Run hook ---
priority: 100
fail_on_error: true
type: shell
uids:
- '3'
- '2'
- '6'
parameters:
  retries: 10
  cmd: ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+'
    | sed '/^#/d' | awk '{print $2}')
  timeout: 180
  interval: 1

2015-03-03T09:33:37 debug: [421] daf5191e-ccb2-4033-9216-dfe21de8b8ee: MC agent 'execute_shell_command', method 'execute', results: {:sender=>"2", :statuscode=>0, :statusmsg=>"OK", :data=>{:stdout=>" 3 Mar 09:33:37 ntpdate[1484]: adjust time server 10.20.0.2 offset -0.000332 sec\n", :exit_code=>0, :stderr=>""}}
2015-03-03T09:33:37 debug: [421] daf5191e-ccb2-4033-9216-dfe21de8b8ee: MC agent 'execute_shell_command', method 'execute', results: {:sender=>"3", :statuscode=>0, :statusmsg=>"OK", :data=>{:stdout=>" 3 Mar 09:33:37 ntpdate[1483]: adjust time server 10.20.0.2 offset -0.000564 sec\n", :exit_code=>0, :stderr=>""}}
2015-03-03T09:33:37 debug: [421] daf5191e-ccb2-4033-9216-dfe21de8b8ee: MC agent 'execute_shell_command', method 'execute', results: {:sender=>"6", :statuscode=>0, :statusmsg=>"OK", :data=>{:stdout=>" 3 Mar 09:33:37 ntpdate[1483]: adjust time server 10.20.0.2 offset -0.000554 sec\n", :exit_code=>0, :stderr=>""}}
2015-03-03T09:33:37 debug: [421] daf5191e-ccb2-4033-9216-dfe21de8b8ee: cmd: cd / && ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+' | sed '/^#/d' | awk '{print $2}')
cwd: /
stdout: 3 Mar 09:33:37 ntpdate[1484]: adjust time server 10.20.0.2 offset -0.000332 sec

stderr:
exit code: 0

That's true for all timesyncs in astute log.

And there is what really broke the deploy:

2015-03-03T10:22:32 info: [421] Run hook ---
priority: 300
type: upload_file
uids:
- '2'
- '3'
- '6'
parameters:
  path: "/etc/hiera/nodes.yaml"
  data: |
    nodes:
    - {fqdn: node-2.domain.tld, internal_address: 192.168.0.2, internal_netmask: 255.255.255.0,
      name: node-2, public_address: 172.16.0.3, public_netmask: 255.255.255.0, role: primary-controller,
      storage_address: 192.168.1.1, storage_netmask: 255.255.255.0, swift_zone: '2', uid: '2',
      user_node_name: 'Untitled (20:ce)'}
    - {fqdn: node-3.domain.tld, internal_address: 192.168.0.3, internal_netmask: 255.255.255.0,
      name: node-3, public_address: 172.16.0.4, public_netmask: 255.255.255.0, role: controller,
      storage_address: 192.168.1.2, storage_netmask: 255.255.255.0, swift_zone: '3', uid: '3',
      user_node_name: 'Untitled (e6:00)'}
    - {fqdn: node-6.domain.tld, internal_address: 192.168.0.4, internal_netmask: 255.255.255.0,
      name: node-6, public_address: 172.16.0.5, public_netmask: 255.255.255.0, role: controller,
      storage_address: 192.168.1.3, storage_netmask: 255.255.255.0, swift_zone: '6', uid: '6',
      user_node_name: 'Untitled (80:28)'}

2015-03-03T10:23:34 debug: [421] daf5191e-ccb2-4033-9216-dfe21de8b8ee: MC agent 'uploadfile', method 'upload', results: {:sender=>"6", :statuscode=>0, :statusmsg=>"OK", :data=>{:msg=>"File was uploaded!"}}
2015-03-03T10:23:34 debug: [421] daf5191e-ccb2-4033-9216-dfe21de8b8ee: MC agent 'uploadfile', method...

Read more...

Changed in fuel:
status: Confirmed → Invalid
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Seems that we still have some problems with virtual environments related to udlc. Reopen.

Changed in fuel:
status: Invalid → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/161189

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/161189
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=579c02a9f9717f96cf91b041a5e38144583174ee
Submitter: Jenkins
Branch: master

commit 579c02a9f9717f96cf91b041a5e38144583174ee
Author: Stanislaw Bogatkin <email address hidden>
Date: Wed Mar 4 15:23:25 2015 +0300

    Remove udlc statement from clocksync manifest

    Seems that udlc breaks virtual environments from time to time.

    Change-Id: Ia804acff717ba7c9d414235679d131028f405bef
    Closes-Bug: #1421965

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Yaroslav Lobankov (ylobankov) wrote :

The issue was not reproduced anymore. Verified on

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  api: "1.0"
  build_number: "200"
  build_id: "2015-03-15_22-54-44"
  nailgun_sha: "713e6684f9f54e29acfe6b8ebf641b9de2292628"
  python-fuelclient_sha: "cc64fff91fb0d78e5a02e7b93ceff224296f84fb"
  astute_sha: "93e427ac49109fa3fd8b0e1d0bb3d14092be2e8c"
  fuellib_sha: "553cb0cffa40a5f57313f962b6ec6a9bd89306ba"
  ostf_sha: "e86c961ceacfa5a8398b6cbda7b70a5f06afb476"
  fuelmain_sha: "c97fd8a789645bda48d06da224f994f8b52d82f5"

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

I reproduced this bug on new 6.1 build.

Deployment has failed. Method granular_deploy. Failed to execute hook .
---
priority: 400
fail_on_error: true
type: shell
uids:
- '1'
- '3'
- '2'
parameters:
  retries: 10
  cmd: ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+'
    | sed '/^#/d' | awk '{print $2}')
  timeout: 180
  interval: 1

Build info:
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  api: "1.0"
  build_number: "202"
  build_id: "2015-03-16_22-54-44"
  nailgun_sha: "874df0d06e32f14db77746cfeb2dd74d4a6e528c"
  python-fuelclient_sha: "2509c9b72cdcdbe46c141685a99b03cd934803be"
  astute_sha: "93e427ac49109fa3fd8b0e1d0bb3d14092be2e8c"
  fuellib_sha: "924d73ae4766646e1c3a44d7b59c4120985e45f0"
  ostf_sha: "e86c961ceacfa5a8398b6cbda7b70a5f06afb476"
  fuelmain_sha: "608b72a6f79a719cf01c35a19d0091fe20c8288a"

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

I have also reproduced this bug on 200th build of MOS 6.1.

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Additional info:

[root@fuel ~]# ntpq -p
     remote refid st t when poll reach delay offset jitter
==============================================================================
+main24.anyplace 105.240.56.33 2 u 65 64 377 17.052 -7.649 21.562
+resolver1.campu 118.188.39.164 2 u 23 64 377 12.785 15.289 20.603
*btr.skif.com.ua 62.149.0.30 2 u 32 64 377 8.720 -1.095 15.148

[root@fuel ~]# ntpdate -vu $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+' | sed '/^#/d' | awk '{print $2}')
17 Mar 12:23:40 ntpdate[900]: ntpdate 4.2.6p5@1.2349-o Sat Nov 23 18:21:48 UTC 2013 (1)
17 Mar 12:23:41 ntpdate[900]: 108.61.73.243 rate limit response from server.
17 Mar 12:23:43 ntpdate[900]: adjust time server 199.102.46.74 offset 0.016440 sec

Steps:

Deployed with following minimal configuration:
1 controller
1 compute
1 ceph osd (replication factor = 1)

CentOS

Neutron with GRE segmentation, all network settings by default.
No additional components installed.
QEMU
Ceph RDB for volumes and images

# cat /etc/fuel/version.yaml
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  api: "1.0"
  build_number: "200"
  build_id: "2015-03-15_22-54-44"
  nailgun_sha: "713e6684f9f54e29acfe6b8ebf641b9de2292628"
  python-fuelclient_sha: "cc64fff91fb0d78e5a02e7b93ceff224296f84fb"
  astute_sha: "93e427ac49109fa3fd8b0e1d0bb3d14092be2e8c"
  fuellib_sha: "553cb0cffa40a5f57313f962b6ec6a9bd89306ba"
  ostf_sha: "e86c961ceacfa5a8398b6cbda7b70a5f06afb476"
  fuelmain_sha: "c97fd8a789645bda48d06da224f994f8b52d82f5"

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Hi, Kyrylo. I'll try to reproduce this again and reopen if succeed.

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.