Fuel for OpenStack

Deployment failed due to inability to sync time in pre-hook task

Bug #1421965 reported by Yaroslav Lobankov on 2015-02-14

This bug affects 6 people

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	High	Stanislaw Bogatkin	Fuel for OpenStack 6.1
	6.0.x	Invalid	High	Stanislaw Bogatkin	Fuel for OpenStack 6.0-updates

Bug Description

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  api: "1.0"
  build_number: "123"
  build_id: "2015-02-14_09-55-08"
  nailgun_sha: "1e3a40dd8a17abe1d38f42da1e0dc1a6d4572666"
  python-fuelclient_sha: "61431ed16fc00039a269424bdbaa410277eff609"
  astute_sha: "1f87a9b9a47de7498b4061d15a8c7fb9435709d5"
  fuellib_sha: "7f8d4382abfcd4338964182ebfea1d539f963e66"
  ostf_sha: "f9c37d0876141e1550eb4e703a8e500cd463282f"
  fuelmain_sha: "2054229e275d08898b5d079a6625ffcc79ae23b8"

ENVIRONMENT:
HA mode, CentOS, Neutron with GRE segmentation, Cinder LVM, Sahara and Ceilometer are enabled, 3 (controller + mongo), 1 (compute + cinder)

Deployment failed with the following error:
Deployment has failed. Method granular_deploy. undefined method `[]' for nil:NilClass.
Inspect Astute logs for the details

In Astute logs I see these errors:
2015-02-14 14:58:34 ERR
[403] Error running RPC method granular_deploy: undefined method `[]' for nil:NilClass, trace:
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:159:in `calculate_multiroles_node_progress'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:96:in `node_validate'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:78:in `block in get_nodes_to_report'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:78:in `map'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:78:in `get_nodes_to_report'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:51:in `report_new_data'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:44:in `report'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/context.rb:35:in `report_and_update_status'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:49:in `block in process'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `process'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine/granular_deployment.rb:201:in `pre_deployment_actions'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine.rb:32:in `deploy'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:312:in `deploy_cluster'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:63:in `granular_deploy'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/dispatcher.rb:99:in `granular_deploy'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:142:in `dispatch_message'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:103:in `block in dispatch'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `call'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `each_with_index'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `dispatch'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]
2015-02-14 14:58:34 ERR
[403] Unexpected error undefined method `[]' for nil:NilClass traceback
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:159:in `calculate_multiroles_node_progress'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:96:in `node_validate'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:78:in `block in get_nodes_to_report'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:78:in `map'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:78:in `get_nodes_to_report'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:51:in `report_new_data'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/reporter.rb:44:in `report'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/context.rb:35:in `report_and_update_status'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:49:in `block in process'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `process'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine/granular_deployment.rb:201:in `pre_deployment_actions'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine.rb:32:in `deploy'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:312:in `deploy_cluster'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:63:in `granular_deploy'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/dispatcher.rb:99:in `granular_deploy'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:142:in `dispatch_message'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:103:in `block in dispatch'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `call'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `each_with_index'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `dispatch'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]

Diagnostic snapshot is attached.

See original description

Revision history for this message

Yaroslav Lobankov (ylobankov) wrote on 2015-02-14:

fuel-snapshot-2015-02-14_15-00-47.tgz Edit (4.7 MiB, application/x-tar)

summary:

- Deployment fails with error "Method granular_deploy. undefined method
+ Deployment failed with error "Method granular_deploy. undefined method
`[]' for nil:NilClass."

Yaroslav Lobankov (ylobankov) on 2015-02-14

description:

updated

Revision history for this message

Oleksiy Molchanov (omolchanov) wrote on 2015-02-16: Re: Deployment failed with error "Method granular_deploy. undefined method `[]' for nil:NilClass."

Seems to be the same problem as I had, there is no default route, so ntp failed to sync time with external server.

Changed in fuel:
assignee:	nobody → Fuel Library Team (fuel-library)
importance:	Undecided → Critical
status:	New → Triaged
milestone:	none → 6.1

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-02-16:

This bug contains no patch, solution or w/a described, setting its status to confirmed

Changed in fuel:
status:	Triaged → Confirmed

Revision history for this message

Oleksiy Butenko (obutenko) wrote on 2015-02-16:

fuel-snapshot-2015-02-16_13-37-26.tgz Edit (5.0 MiB, application/x-tar)

I'am reproduced this bug on:
Centos, Neutron with GRE segmentation, 3 controller + 1 computer + 1 cinder
Snapshot attached.
But when I deploy configuration with: Centos, Neutron with GRE segmentation, 1 controller + 1 computer + 1 cinder - install finished correctly.
If use Ubuntu - all env install correctly.
Perhaps bug associated with HA mode and centos

api: '1.0'
astute_sha: 1f87a9b9a47de7498b4061d15a8c7fb9435709d5
auth_required: true
build_id: 2015-02-15_22-54-44
build_number: '126'
feature_groups:
mirantis
fuellib_sha: 7f8d4382abfcd4338964182ebfea1d539f963e66
fuelmain_sha: 2054229e275d08898b5d079a6625ffcc79ae23b8
nailgun_sha: 1e3a40dd8a17abe1d38f42da1e0dc1a6d4572666
ostf_sha: f9c37d0876141e1550eb4e703a8e500cd463282f
production: docker
python-fuelclient_sha: 61431ed16fc00039a269424bdbaa410277eff609
release: '6.1'
release_versions:
2014.2-6.1:
VERSION:
api: '1.0'
astute_sha: 1f87a9b9a47de7498b4061d15a8c7fb9435709d5
build_id: 2015-02-15_22-54-44
build_number: '126'
feature_groups:
mirantis
fuellib_sha: 7f8d4382abfcd4338964182ebfea1d539f963e66
fuelmain_sha: 2054229e275d08898b5d079a6625ffcc79ae23b8
nailgun_sha: 1e3a40dd8a17abe1d38f42da1e0dc1a6d4572666
ostf_sha: f9c37d0876141e1550eb4e703a8e500cd463282f
production: docker
python-fuelclient_sha: 61431ed16fc00039a269424bdbaa410277eff609
release: '6.1'

Revision history for this message

Oleksiy Butenko (obutenko) wrote on 2015-02-16:

Update: I use Neutron with VLAN, not with GRE

Revision history for this message

Oleksiy Butenko (obutenko) wrote on 2015-02-16:

fuel-snapshot-2015-02-16_14-35-55.tgz Edit (10.9 MiB, application/x-tar)

I'am reproduced this bug with ubuntu.
ENVIRONMENT:
HA mode, UBUNTU, NOVA network, Cinder LVM, Ceilometer are enabled, 3 (controller + mongo), 2 (compute + cinder)

Bogdan Dobrelya (bogdando) on 2015-02-16

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Stanislaw Bogatkin (sbogatkin)

OpenStack Infra (hudson-openstack) on 2015-02-16

Changed in fuel:
status:	Confirmed → In Progress

Igor Belikov (ibelikov) on 2015-02-16

Changed in fuel:
status:	In Progress → Confirmed

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-02-17:

Addressed by https://review.openstack.org/#/c/156251/

Changed in fuel:
status:	Confirmed → In Progress

Revision history for this message

Kyrylo Romanenko (kromanenko) wrote on 2015-02-17:

Also reproduced that on
{"build_id": "2015-02-12_10-39-11", "ostf_sha": "f9c37d0876141e1550eb4e703a8e500cd463282f", "build_number": "115", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-02-12_10-39-11", "ostf_sha": "f9c37d0876141e1550eb4e703a8e500cd463282f", "build_number": "115", "api": "1.0", "nailgun_sha": "78e1fa50e38efc1001ddfe0565a55e9e176ff5f6", "production": "docker", "python-fuelclient_sha": "61431ed16fc00039a269424bdbaa410277eff609", "astute_sha": "2159855ba7b82956ac0787a4e7be053105c4c1f1", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "892c80aa9adc9f53e9b3061d4754203953a84db7", "fuellib_sha": "592df3ba1dbfba6c6c84f90ff36b0c4c697934d3"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "78e1fa50e38efc1001ddfe0565a55e9e176ff5f6", "production": "docker", "python-fuelclient_sha": "61431ed16fc00039a269424bdbaa410277eff609", "astute_sha": "2159855ba7b82956ac0787a4e7be053105c4c1f1", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "892c80aa9adc9f53e9b3061d4754203953a84db7", "fuellib_sha": "592df3ba1dbfba6c6c84f90ff36b0c4c697934d3"}

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-02-17: Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/156251
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=bd74e318fe83d3e522630f77ae3a65a4b5777ba6
Submitter: Jenkins
Branch: master

commit bd74e318fe83d3e522630f77ae3a65a4b5777ba6
Author: Stanislaw Bogatkin <email address hidden>
Date: Mon Feb 16 18:22:58 2015 +0300

Enable iburst mode and udlc in ntp module

    Change-Id: I8b28b5d1c7a0a80ea5fa25648f40673011675864
    Closes-Bug: #1298360
    Closes-Bug: #1421289
    Closes-Bug: #1421965

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

Oleksiy Butenko (obutenko) wrote on 2015-02-17: Re: Deployment failed with error "Method granular_deploy. undefined method `[]' for nil:NilClass."

#10

Also reproduced that on mos 6.1 build 114

But doesn't reproduced that on mos 6.1 build 105 & 6.0.1 build 92
{"build_id": "2015-02-07_22-55-01", "ostf_sha": "6c046b69d29021524906109f18092363505ee222", "build_number": "105", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-02-07_22-55-01", "ostf_sha": "6c046b69d29021524906109f18092363505ee222", "build_number": "105", "api": "1.0", "nailgun_sha": "6d1769b21819f8fb4195f1bd9c44c038721ae3d4", "production": "docker", "python-fuelclient_sha": "521c2491f7f04f31d8c85db68499cd193d4904e3", "astute_sha": "7e6e6f9188bd69c603853b10d4a55149363323cc", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "", "fuellib_sha": "769af7fe30225cd15638ea2e6dffaa286bc06da1"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "6d1769b21819f8fb4195f1bd9c44c038721ae3d4", "production": "docker", "python-fuelclient_sha": "521c2491f7f04f31d8c85db68499cd193d4904e3", "astute_sha": "7e6e6f9188bd69c603853b10d4a55149363323cc", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "", "fuellib_sha": "769af7fe30225cd15638ea2e6dffaa286bc06da1"}

{"build_id": "2015-02-15_20-49-44", "ostf_sha": "3b57985d4d2155510894a1f6d03b478b201f7780", "build_number": "92", "auth_required": true, "api": "1.0", "nailgun_sha": "6967b24adc4d74e36d69b59973ff79d6ab2389e5", "production": "docker", "fuelmain_sha": "c799e3a6d88289e58db764a6be7910aab7da3149", "astute_sha": "f7cda2171b0b677dfaeb59693d980a2d3ee4c3e0", "feature_groups": ["mirantis"], "release": "6.0.1", "release_versions": {"2014.2-6.0.1": {"VERSION": {"build_id": "2015-02-15_20-49-44", "ostf_sha": "3b57985d4d2155510894a1f6d03b478b201f7780", "build_number": "92", "api": "1.0", "nailgun_sha": "6967b24adc4d74e36d69b59973ff79d6ab2389e5", "production": "docker", "fuelmain_sha": "c799e3a6d88289e58db764a6be7910aab7da3149", "astute_sha": "f7cda2171b0b677dfaeb59693d980a2d3ee4c3e0", "feature_groups": ["mirantis"], "release": "6.0.1", "fuellib_sha": "c05541e54f2a127f9c38f7ddb6abf30584fc55e4"}}}, "fuellib_sha": "c05541e54f2a127f9c38f7ddb6abf30584fc55e4"}

Also reproduced that on mos 6.1 build  114

But doesn't reproduced that on mos 6.1 build  105 & 6.0.1 build 92
{"build_id": "2015-02-07_22-55-01", "ostf_sha": "6c046b69d29021524906109f18092363505ee222", "build_number": "105", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-02-07_22-55-01", "ostf_sha": "6c046b69d29021524906109f18092363505ee222", "build_number": "105", "api": "1.0", "nailgun_sha": "6d1769b21819f8fb4195f1bd9c44c038721ae3d4", "production": "docker", "python-fuelclient_sha": "521c2491f7f04f31d8c85db68499cd193d4904e3", "astute_sha": "7e6e6f9188bd69c603853b10d4a55149363323cc", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "", "fuellib_sha": "769af7fe30225cd15638ea2e6dffaa286bc06da1"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "6d1769b21819f8fb4195f1bd9c44c038721ae3d4", "production": "docker", "python-fuelclient_sha": "521c2491f7f04f31d8c85db68499cd193d4904e3", "astute_sha": "7e6e6f9188bd69c603853b10d4a55149363323cc", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "", "fuellib_sha": "769af7fe30225cd15638ea2e6dffaa286bc06da1"}

Revision history for this message

Vasyl Saienko (vsaienko) wrote on 2015-02-20:

#11

fuel-snapshot-2015-02-20_14-46-26.tgz Edit (7.3 MiB, application/x-tar)

I reproduced this bug with ubuntu
ENVIRONMENT:
HA mode, UBUNTU, Neutron, Cinder LVM
3 Controllers
16 Compute + cinder

Revision history for this message

Vasyl Saienko (vsaienko) wrote on 2015-02-20:

#12

api: '1.0'
astute_sha: 8af8e88c3cb17b66368e7a038f1899e5c7c13e98
auth_required: true
build_id: 2015-02-17_22-54-44
build_number: '129'
feature_groups:
- mirantis
fuellib_sha: 5956f0be54600ffad96ef50eb2ba91307d3664d9
fuelmain_sha: 1c249116ca4285be28ce78be27cb5a70f0bf2fb8
nailgun_sha: 9b5b8faac0cac75507adb0958c96ade9285525cb
ostf_sha: 66f58a43c30b98c5a4f7cf040712ca7a588ea761
production: docker
python-fuelclient_sha: c4f8f7cf81c7af55681f8b85c03e179975075e73
release: '6.1'
release_versions:
  2014.2-6.1:
    VERSION:
      api: '1.0'
      astute_sha: 8af8e88c3cb17b66368e7a038f1899e5c7c13e98
      build_id: 2015-02-17_22-54-44
      build_number: '129'
      feature_groups:
      - mirantis
      fuellib_sha: 5956f0be54600ffad96ef50eb2ba91307d3664d9
      fuelmain_sha: 1c249116ca4285be28ce78be27cb5a70f0bf2fb8
      nailgun_sha: 9b5b8faac0cac75507adb0958c96ade9285525cb
      ostf_sha: 66f58a43c30b98c5a4f7cf040712ca7a588ea761
      production: docker
      python-fuelclient_sha: c4f8f7cf81c7af55681f8b85c03e179975075e73
      release: '6.1'

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-02-24:

#13

We don't have such bug in releases below 6.1 due to bug root lay in new ntp package and reconfiguration that follow. In 6.0 we don't have such settings that we have in 6.1. Closed.

Revision history for this message

Kyrylo Romanenko (kromanenko) wrote on 2015-02-24:

#14

Just caught this bug on following build:

{"build_id": "2015-02-23_22-54-44", "ostf_sha": "1a0b2c6618fac098473c2ed5a9af11d3a886a3bb", "build_number": "140", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-02-23_22-54-44", "ostf_sha": "1a0b2c6618fac098473c2ed5a9af11d3a886a3bb", "build_number": "140", "api": "1.0", "nailgun_sha": "3616ae9df4ac3e088157bb94f73743a521f76f1a", "production": "docker", "python-fuelclient_sha": "5657dbf06fddb74adb61e9668eb579a1c57d8af8", "astute_sha": "d81ff53c2f467151ecde120d3a4d284e3b5b3dfc", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "b975019fabdb429c1869047df18dd792d2163ecc", "fuellib_sha": "8b79d47ef41bff293210d2a7b1bb02843f70948d"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "3616ae9df4ac3e088157bb94f73743a521f76f1a", "production": "docker", "python-fuelclient_sha": "5657dbf06fddb74adb61e9668eb579a1c57d8af8", "astute_sha": "d81ff53c2f467151ecde120d3a4d284e3b5b3dfc", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "b975019fabdb429c1869047df18dd792d2163ecc", "fuellib_sha": "8b79d47ef41bff293210d2a7b1bb02843f70948d"}

Nodes:
1) Controller + CephOSD
2) Compute +CephOSD
3) CephOSD

Juno on Ubuntu 12.04.4 (2014.2-6.1)
Multi-node with HA
Neutron with VLAN segmentation
QEMU

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-02-24:

#15

Kyrylo, can you add a diagnostic snapshot, please? Root cause of TS was incorrect ntpdate, it can be seen in astute log on master node. But I cannot reopen bug while will not see such log from newer ISO.

Revision history for this message

Kyrylo Romanenko (kromanenko) wrote on 2015-02-24:

#16

Here it is
https://drive.google.com/a/mirantis.com/file/d/0B6E70aHvCcRQbXJOUEJ1eFVrZnM/view?usp=sharing

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-02-24:

#17

Reopen.

Changed in fuel:
status:	Fix Committed → Confirmed

Revision history for this message

Victor Ryzhenkin (vryzhenkin) wrote on 2015-02-24:

#18

Reproduced one more time on MOS 6.1 #140 CentOS 6.5 with All Ceph, Murano and Neutron GRE.

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-02-24:

#19

So, deeper investigation show why it happens:
For reproduce you should do next steps:

1. Deploy master node with system tests (it will provision master, deploy it and start ntpd on it). On the end of this action virtual machine will stopped.
2. Wait a little (in my case I wait 20 minutes, maybe a little less).
3. Resume master VM.

Now we have situation when remote servers give to ntpd new time (on 20 minutes more that it have). NTP protocol wasn't created to handle such situation (http://www.ntp.org/ntpfaq/NTP-s-algo.htm#Q-ALGO-BASIC-STEP-SLEW) - it will say that jitter and offset are huge. For that case we have local clock source - and master start give local time to bootstraps.

4. Start bootstrap nodes.
They can get local time or get nothing - it's not so critical as can be seem

5. Set roles and start deploy. It will start provisioning and deploy.
In that time our master node already will understand that remote servers give her right time - and jitter will subside. Now we have good remote servers with huge offset. Our master will switch from local server to remote servers (but she will not give time to other nodes anymore due to huge offset between remote servers and local time). How long we should wait to get that situation? In my case it was about 10 minutes.

6. Provisioning stage ends. Predeploy starts. There we have task to sync time between nodes and master. As you can understand, it will fail due to huge offset on master node.

So, actually, it is not bug. It just how NTP protocol works. We know about it and restart ntpd when revert snapshots in system tests - it's why we don't see that error on CI. But we cannot restart ntpd when someone push 'unpause' in virt-manager or write 'virsh resume <id>' in console.

I see next options to handle this situation:
1. Do not forget manually restart ntpdate and ntpd on master node after running 'setup' group in tests (maybe some other, I just use 'setup')
2. Stop pausing master VM on end of deploy. Or just leave it alone as it is, or shutdown it after test is done.

So, deeper investigation show why it happens:
For reproduce you should do next steps:

4. Start bootstrap nodes.
They can get local time or get nothing - it's not so critical as can be seem

6. Provisioning stage ends. Predeploy starts. There we have task to sync time between nodes and master. As you can understand, it will fail due to huge offset on master node.

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-02-25:

#20

As long as this bug happens only in custom deploys, only in VMs, only when you manually resuming VMs, only in some cases after all of that, lowering to high.

Changed in fuel:
importance:	Critical → High

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-02-25:

#21

After speaking with QA team, I can say next things:

1. If you do custom deploy - please, resume your VMs with dos.py. It has 'resume' option which already do ntpd restart automatically.
2. If you can't or don't want to do resume with dos.py - please, do next steps after master node resume (and before bootstraps start):
-- stop ntpd (do /etc/init.d/ntpd stop)
-- restart ntpdate (do /etc/init.d/ntpd restart)
-- start ntpd (do /etc/init.d/ntpd start)

As long as it is not bug but just usual ntpd behavior, I close this bug.

Changed in fuel:
status:	Confirmed → Won't Fix
status:	Won't Fix → Invalid

Revision history for this message

Timur Nurlygayanov (tnurlygayanov) wrote on 2015-02-25:

#22

This issue reproduced on my environment with VirtualBox, 1 controller, 1 compute, Ubuntu, Neutron VLAN.
I can see that Ubuntu was successfully installed and after that deployment failed.
We tried to redeploy the environment via CLI, but it fails 3 times and only after additional manual manipulation it was fixed (Sergey Kolekonov fixed this issue on my environment somehow)

So, looks like this issue can be reproduced on the customers environments and it will be the real pain if we will fail deployment because of ntp daemon doesn't work correctly. We need to fix this behaviour to avoid such fails on the production/qa/demo environments.

I can help to reproduce the issue, if it is required.

Changed in fuel:
status:	Invalid → Confirmed

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-02-25:

#23

Are you really read what I written before? It is NOT a bug, it is just how NTP works. If you want to successful deploy - restart ntpd manually or resume machines with dos.py. You cannot do anything more than restart ntpd on master node. If someone know any other method - I will glad to hear that.

Changed in fuel:
status:	Confirmed → Invalid

Revision history for this message

Chang-Yi Lee (cy-lee) wrote on 2015-02-26:

#24

fuel-snapshot-2015-02-26_11-22-22.tgz Edit (28.5 MiB, application/x-tar)

We encounter same problem and we are deploy to bare-metal machines

Dell R620 X 3 (controller)
Dell R720xd X 3 (compute)

{"ostf_sha": "1a0b2c6618fac098473c2ed5a9af11d3a886a3bb", "release_versions": {"2014.2-6.1": {"VERSION": {"ostf_sha": "1a0b2c6618fac098473c2ed5a9af11d3a886a3bb", "api": "1.0", "nailgun_sha": "54b0302665b7c02009b89f693170c26437276e76", "production": "docker", "python-fuelclient_sha": "5657dbf06fddb74adb61e9668eb579a1c57d8af8", "astute_sha": "6d6ad68e0cde286d74ac7d52e21da4fc8dcbe9ab", "feature_groups": ["experimental"], "release": "6.1", "fuelmain_sha": "87223c15bb734bb668050850a6b43aa4291349b7", "fuellib_sha": "8384b8ca4db84794fb21e287202f05e31f78841c"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "54b0302665b7c02009b89f693170c26437276e76", "production": "docker", "python-fuelclient_sha": "5657dbf06fddb74adb61e9668eb579a1c57d8af8", "astute_sha": "6d6ad68e0cde286d74ac7d52e21da4fc8dcbe9ab", "feature_groups": ["experimental"], "release": "6.1", "fuelmain_sha": "87223c15bb734bb668050850a6b43aa4291349b7", "fuellib_sha": "8384b8ca4db84794fb21e287202f05e31f78841c"}

OS:
Ubuntu 12.04

Reference architecture:
HA

Network model:
Neutron+VLAN

Related Projects installed:
Ceilometer

Nodes:

Controller + mongodb * 3
Compute + Ceph-OSD * 3

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-02-26:

#25

Hi, Chang-Yi Lee.
Actually, your problem isn't related to this bug. You have next lines in astute.yaml:

/etc/puppet/modules/osnailyfacter/modular/astute/upload_cirros.rb:65:in `wait_for_glance': Could not get a list of glance images! (RuntimeError)
from /etc/puppet/modules/osnailyfacter/modular/astute/upload_cirros.rb:89

So, your deployment was broke due to another problem, not related to ntpd. I woul appreciate if you file a new bug about it.

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-02-26:

#26

Sorry for mistake. It is lines in astute.log, not yaml.

Stanislaw Bogatkin (sbogatkin) on 2015-02-26

summary:

- Deployment failed with error "Method granular_deploy. undefined method
- `[]' for nil:NilClass."
+ Deployment failed due to inability to sync time in pre-hook task

Revision history for this message

Timur Nurlygayanov (tnurlygayanov) wrote on 2015-03-02:

#27

Stanislaw, could we add restart of ntpd daemon before each deployment?

Revision history for this message

Ivan Kolodyazhny (e0ne) wrote on 2015-03-02:

#28

Issue is still reproduced, iso#128.
Could not find data item controller_internal_addresses in any Hiera data file and no default supplied at /etc/puppet/modules/osnailyfacter/modular/cluster/cluster.pp:5 on node node-1.test.domain.local

Changed in fuel:
status:	Invalid → Confirmed

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-03-02:

#29

Ivan, can you tell me, how you reproduce this bug, what you did? And attach diagnostic snapshot, please.
I reminder - bug will be closed, if you not restart ntpd on master after manual resume before deployment.

Changed in fuel:
status:	Confirmed → Incomplete

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-03-02:

#30

Timur - we can, but it's not right, as I think. You shouldn't suspend your VM and expect that it will be work ok after resume. You just should use dos.py for revert - it is right way.

Revision history for this message

Timur Nurlygayanov (tnurlygayanov) wrote on 2015-03-02:

#31

Stanislaw, I don't use fuel-devops scripts for OpenStack deployments, it is not fuel master node from snapshot - it is fresh installation - so, it is not clear why fresh installation doesn't work from the box.

So, the steps to reproduce are the following:
1. Take new ISO MOS 6.1
2. Start to deploy environment with VirtualBox or KVM.
3. Configure OpenStack cloud and start to deploy it.

After that we can see this error - it is looks like 'bad feature', so, it worked fine previously and only after the nptd re-factoring it started to fail... strange!

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-03-02:

#32

Timur, I cannot reproduce this bug if I don't pause master node VM. And noone from this thread didn't too. Am I right that you just install new master node from scratch, never pause or suspend it, then tried to deploy some env and this env is failed? If so - could you contact me, please to see on your env, cause, as I said, I cannot reproduce this bug in such clause.

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2015-03-02:

#34

@Stanislaw I've looked at the Ivan's env, he has other issue not related to this report @Ivan could you confirm that

Revision history for this message

Ivan Kolodyazhny (e0ne) wrote on 2015-03-02:

#35

@Tatyanka, @Stanislaw, thanks for help. My issue was with multinode mode, not with NTP. HA works for me

Revision history for this message

Timur Nurlygayanov (tnurlygayanov) wrote on 2015-03-03:

#36

Reproduced on my environment with 3 controllers, snapshot is attached.

Steps To Reproduce:
1. Deploy env with Ubuntu, 1 controller and 1 compute. Deployment will be finished successfully.
2. Create cluster with CentOS, 3 controllers and 0 computes, start deployment

Deployment will fail.

Snapshot:
https://copy.com/ny9HOqXfexxxQpKH

Changed in fuel:
status:	Incomplete → Confirmed

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2015-03-03:

#37

Download full text (4.2 KiB)

Hello again, Timur.
That's what I see in astute.log file from your last snapshot:

2015-03-03T09:33:37 info: [421] Run hook ---
priority: 100
fail_on_error: true
type: shell
uids:
- '3'
- '2'
- '6'
parameters:
  retries: 10
  cmd: ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+'
    | sed '/^#/d' | awk '{print $2}')
  timeout: 180
  interval: 1

2015-03-03T09:33:37 debug: [421] daf5191e-ccb2-4033-9216-dfe21de8b8ee: MC agent 'execute_shell_command', method 'execute', results: {:sender=>"2", :statuscode=>0, :statusmsg=>"OK", :data=>{:stdout=>" 3 Mar 09:33:37 ntpdate[1484]: adjust time server 10.20.0.2 offset -0.000332 sec\n", :exit_code=>0, :stderr=>""}}
2015-03-03T09:33:37 debug: [421] daf5191e-ccb2-4033-9216-dfe21de8b8ee: MC agent 'execute_shell_command', method 'execute', results: {:sender=>"3", :statuscode=>0, :statusmsg=>"OK", :data=>{:stdout=>" 3 Mar 09:33:37 ntpdate[1483]: adjust time server 10.20.0.2 offset -0.000564 sec\n", :exit_code=>0, :stderr=>""}}
2015-03-03T09:33:37 debug: [421] daf5191e-ccb2-4033-9216-dfe21de8b8ee: MC agent 'execute_shell_command', method 'execute', results: {:sender=>"6", :statuscode=>0, :statusmsg=>"OK", :data=>{:stdout=>" 3 Mar 09:33:37 ntpdate[1483]: adjust time server 10.20.0.2 offset -0.000554 sec\n", :exit_code=>0, :stderr=>""}}
2015-03-03T09:33:37 debug: [421] daf5191e-ccb2-4033-9216-dfe21de8b8ee: cmd: cd / && ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+' | sed '/^#/d' | awk '{print $2}')
cwd: /
stdout: 3 Mar 09:33:37 ntpdate[1484]: adjust time server 10.20.0.2 offset -0.000332 sec

stderr:
exit code: 0

That's true for all timesyncs in astute log.

And there is what really broke the deploy:

2015-03-03T10:22:32 info: [421] Run hook ---
priority: 300
type: upload_file
uids:
- '2'
- '3'
- '6'
parameters:
  path: "/etc/hiera/nodes.yaml"
  data: |
    nodes:
    - {fqdn: node-2.domain.tld, internal_address: 192.168.0.2, internal_netmask: 255.255.255.0,
      name: node-2, public_address: 172.16.0.3, public_netmask: 255.255.255.0, role: primary-controller,
      storage_address: 192.168.1.1, storage_netmask: 255.255.255.0, swift_zone: '2', uid: '2',
      user_node_name: 'Untitled (20:ce)'}
    - {fqdn: node-3.domain.tld, internal_address: 192.168.0.3, internal_netmask: 255.255.255.0,
      name: node-3, public_address: 172.16.0.4, public_netmask: 255.255.255.0, role: controller,
      storage_address: 192.168.1.2, storage_netmask: 255.255.255.0, swift_zone: '3', uid: '3',
      user_node_name: 'Untitled (e6:00)'}
    - {fqdn: node-6.domain.tld, internal_address: 192.168.0.4, internal_netmask: 255.255.255.0,
      name: node-6, public_address: 172.16.0.5, public_netmask: 255.255.255.0, role: controller,
      storage_address: 192.168.1.3, storage_netmask: 255.255.255.0, swift_zone: '6', uid: '6',
      user_node_name: 'Untitled (80:28)'}

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.