Fuel for OpenStack

Deployment fails with mcollective upload_file agent error

Bug #1433169 reported by Kyrylo Romanenko on 2015-03-17

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	High	Sergey Vasilenko	Fuel for OpenStack 6.1

Bug Description

Steps:

Attempted to deploy configuration:
1 controller
1 compute
1 ceph osd (replication factor = 1)

CentOS

FlatDHCP, all network settings by default.
No additional components installed.
QEMU
Ceph RDB for volumes and images

# cat /etc/fuel/version.yaml
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  api: "1.0"
  build_number: "200"
  build_id: "2015-03-15_22-54-44"
  nailgun_sha: "713e6684f9f54e29acfe6b8ebf641b9de2292628"
  python-fuelclient_sha: "cc64fff91fb0d78e5a02e7b93ceff224296f84fb"
  astute_sha: "93e427ac49109fa3fd8b0e1d0bb3d14092be2e8c"
  fuellib_sha: "553cb0cffa40a5f57313f962b6ec6a9bd89306ba"
  ostf_sha: "e86c961ceacfa5a8398b6cbda7b70a5f06afb476"
  fuelmain_sha: "c97fd8a789645bda48d06da224f994f8b52d82f5"

Astute logs:

2015-03-17 15:30:25 ERR
[418] Error running RPC method granular_deploy: Failed to execute hook .
---
priority: 100
type: upload_file
uids:
- '4'
- '5'
- '6'
parameters:
  path: "/etc/hiera/nodes.yaml"
  data: |
    nodes:
    - {fqdn: node-4.domain.tld, internal_address: 192.168.0.3, internal_netmask: 255.255.255.0,
      name: node-4, public_address: 172.16.0.4, public_netmask: 255.255.255.0, role: primary-controller,
      storage_address: 192.168.1.1, storage_netmask: 255.255.255.0, swift_zone: '4', uid: '4',
      user_node_name: 'Untitled (44:13)'}
    - {fqdn: node-5.domain.tld, internal_address: 192.168.0.4, internal_netmask: 255.255.255.0,
      name: node-5, public_address: 172.16.0.5, public_netmask: 255.255.255.0, role: compute,
      storage_address: 192.168.1.2, storage_netmask: 255.255.255.0, swift_zone: '5', uid: '5',
      user_node_name: 'Untitled (f8:e2)'}
    - {fqdn: node-6.domain.tld, internal_address: 192.168.0.5, internal_netmask: 255.255.255.0,
      name: node-6, public_address: 172.16.0.6, public_netmask: 255.255.255.0, role: ceph-osd,
      storage_address: 192.168.1.3, storage_netmask: 255.255.255.0, swift_zone: '6', uid: '6',
      user_node_name: 'Untitled (cb:61)'}
  content: |
    nodes:
    - {fqdn: node-4.domain.tld, internal_address: 192.168.0.3, internal_netmask: 255.255.255.0,
      name: node-4, public_address: 172.16.0.4, public_netmask: 255.255.255.0, role: primary-controller,
      storage_address: 192.168.1.1, storage_netmask: 255.255.255.0, swift_zone: '4', uid: '4',
      user_node_name: 'Untitled (44:13)'}
    - {fqdn: node-5.domain.tld, internal_address: 192.168.0.4, internal_netmask: 255.255.255.0,
      name: node-5, public_address: 172.16.0.5, public_netmask: 255.255.255.0, role: compute,
      storage_address: 192.168.1.2, storage_netmask: 255.255.255.0, swift_zone: '5', uid: '5',
      user_node_name: 'Untitled (f8:e2)'}
    - {fqdn: node-6.domain.tld, internal_address: 192.168.0.5, internal_netmask: 255.255.255.0,
      name: node-6, public_address: 172.16.0.6, public_netmask: 255.255.255.0, role: ceph-osd,
      storage_address: 192.168.1.3, storage_netmask: 255.255.255.0, swift_zone: '6', uid: '6',
      user_node_name: 'Untitled (cb:61)'}
  overwrite: true
  parents: true
  permissions: '0644'
  user_owner: root
  group_owner: root
  dir_permissions: '0755'
, trace:
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:54:in `block in process'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `process'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine/granular_deployment.rb:219:in `post_deployment_actions'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine.rb:74:in `deploy'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:127:in `deploy_cluster'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:56:in `granular_deploy'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/dispatcher.rb:111:in `granular_deploy'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:142:in `dispatch_message'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:103:in `block in dispatch'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `call'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `each_with_index'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `dispatch'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]
2015-03-17 15:30:25 ERR
[418] 6998416e-eede-423c-b3b0-1855006a6750: mcollective upload_file agent error: 6998416e-eede-423c-b3b0-1855006a6750: MCollective agents '4' didn't respond within the allotted time.
2015-03-17 15:30:25 ERR
[418] MCollective agents '4' didn't respond within the allotted time.
2015-03-17 15:18:59 ERR
[418] No more tasks will be executed on the node 4
2015-03-17 15:18:59 ERR
[418] Task '{"priority"=>600, "type"=>"puppet", "uids"=>["4"], "parameters"=>{"puppet_modules"=>"/etc/puppet/modules", "puppet_manifest"=>"/etc/puppet/modules/osnailyfacter/modular/netconfig/netconfig.pp", "timeout"=>3600, "cwd"=>"/"}}' on node 4 valid, but failed

Note on a screenshot that controller node displayed as OFFLINE. But it`s actually online and can be accessed via its VirtualBox window. Screenshot attached.
Diagnostic snapshot:
https://drive.google.com/file/d/0B6E70aHvCcRQS2hVcGVxbmNjcm8/view?usp=sharing

Revision history for this message

Kyrylo Romanenko (kromanenko) wrote on 2015-03-17:

Selection_028.png Edit (184.4 KiB, image/png)

Revision history for this message

Ryan Moe (rmoe) wrote on 2015-03-18:

The last thing that happened on node-4 was "Executing '/sbin/ip --force link set dev eth0 down'" in the puppet log. After that point all remote logging stopped for node-4 and it was no longer able to connect to the master node. Can you provide the puppet log from the host itself? It seems like that interface never came back up.

Timur Nurlygayanov (tnurlygayanov) on 2015-03-19

Changed in fuel:
importance:	Undecided → High
milestone:	none → 6.1

Stanislaw Bogatkin (sbogatkin) on 2015-03-27

Changed in fuel:
assignee:	nobody → Fuel Library Team (fuel-library)

Vladimir Kuklin (vkuklin) on 2015-04-02

Changed in fuel:
status:	New → Confirmed

Vladimir Kuklin (vkuklin) on 2015-04-03

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Sergey Vasilenko (xenolog)

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2015-04-09:

Have not seen this been reproduced anymore. I am moving this to incomplete status

Changed in fuel:
status:	Confirmed → Incomplete

Vladimir Kuklin (vkuklin) on 2015-04-24

Changed in fuel:
status:	Incomplete → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Selection_028.png Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.