Deployment fails with mcollective upload_file agent error

Bug #1433169 reported by Kyrylo Romanenko
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Sergey Vasilenko

Bug Description

Steps:

Attempted to deploy configuration:
1 controller
1 compute
1 ceph osd (replication factor = 1)

CentOS

FlatDHCP, all network settings by default.
No additional components installed.
QEMU
Ceph RDB for volumes and images

# cat /etc/fuel/version.yaml
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  api: "1.0"
  build_number: "200"
  build_id: "2015-03-15_22-54-44"
  nailgun_sha: "713e6684f9f54e29acfe6b8ebf641b9de2292628"
  python-fuelclient_sha: "cc64fff91fb0d78e5a02e7b93ceff224296f84fb"
  astute_sha: "93e427ac49109fa3fd8b0e1d0bb3d14092be2e8c"
  fuellib_sha: "553cb0cffa40a5f57313f962b6ec6a9bd89306ba"
  ostf_sha: "e86c961ceacfa5a8398b6cbda7b70a5f06afb476"
  fuelmain_sha: "c97fd8a789645bda48d06da224f994f8b52d82f5"

Astute logs:

2015-03-17 15:30:25 ERR
[418] Error running RPC method granular_deploy: Failed to execute hook .
---
priority: 100
type: upload_file
uids:
- '4'
- '5'
- '6'
parameters:
  path: "/etc/hiera/nodes.yaml"
  data: |
    nodes:
    - {fqdn: node-4.domain.tld, internal_address: 192.168.0.3, internal_netmask: 255.255.255.0,
      name: node-4, public_address: 172.16.0.4, public_netmask: 255.255.255.0, role: primary-controller,
      storage_address: 192.168.1.1, storage_netmask: 255.255.255.0, swift_zone: '4', uid: '4',
      user_node_name: 'Untitled (44:13)'}
    - {fqdn: node-5.domain.tld, internal_address: 192.168.0.4, internal_netmask: 255.255.255.0,
      name: node-5, public_address: 172.16.0.5, public_netmask: 255.255.255.0, role: compute,
      storage_address: 192.168.1.2, storage_netmask: 255.255.255.0, swift_zone: '5', uid: '5',
      user_node_name: 'Untitled (f8:e2)'}
    - {fqdn: node-6.domain.tld, internal_address: 192.168.0.5, internal_netmask: 255.255.255.0,
      name: node-6, public_address: 172.16.0.6, public_netmask: 255.255.255.0, role: ceph-osd,
      storage_address: 192.168.1.3, storage_netmask: 255.255.255.0, swift_zone: '6', uid: '6',
      user_node_name: 'Untitled (cb:61)'}
  content: |
    nodes:
    - {fqdn: node-4.domain.tld, internal_address: 192.168.0.3, internal_netmask: 255.255.255.0,
      name: node-4, public_address: 172.16.0.4, public_netmask: 255.255.255.0, role: primary-controller,
      storage_address: 192.168.1.1, storage_netmask: 255.255.255.0, swift_zone: '4', uid: '4',
      user_node_name: 'Untitled (44:13)'}
    - {fqdn: node-5.domain.tld, internal_address: 192.168.0.4, internal_netmask: 255.255.255.0,
      name: node-5, public_address: 172.16.0.5, public_netmask: 255.255.255.0, role: compute,
      storage_address: 192.168.1.2, storage_netmask: 255.255.255.0, swift_zone: '5', uid: '5',
      user_node_name: 'Untitled (f8:e2)'}
    - {fqdn: node-6.domain.tld, internal_address: 192.168.0.5, internal_netmask: 255.255.255.0,
      name: node-6, public_address: 172.16.0.6, public_netmask: 255.255.255.0, role: ceph-osd,
      storage_address: 192.168.1.3, storage_netmask: 255.255.255.0, swift_zone: '6', uid: '6',
      user_node_name: 'Untitled (cb:61)'}
  overwrite: true
  parents: true
  permissions: '0644'
  user_owner: root
  group_owner: root
  dir_permissions: '0755'
, trace:
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:54:in `block in process'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/nailgun_hooks.rb:26:in `process'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine/granular_deployment.rb:219:in `post_deployment_actions'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/deployment_engine.rb:74:in `deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:127:in `deploy_cluster'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:56:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/dispatcher.rb:111:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:142:in `dispatch_message'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:103:in `block in dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `each_with_index'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]
2015-03-17 15:30:25 ERR
[418] 6998416e-eede-423c-b3b0-1855006a6750: mcollective upload_file agent error: 6998416e-eede-423c-b3b0-1855006a6750: MCollective agents '4' didn't respond within the allotted time.
2015-03-17 15:30:25 ERR
[418] MCollective agents '4' didn't respond within the allotted time.
2015-03-17 15:18:59 ERR
[418] No more tasks will be executed on the node 4
2015-03-17 15:18:59 ERR
[418] Task '{"priority"=>600, "type"=>"puppet", "uids"=>["4"], "parameters"=>{"puppet_modules"=>"/etc/puppet/modules", "puppet_manifest"=>"/etc/puppet/modules/osnailyfacter/modular/netconfig/netconfig.pp", "timeout"=>3600, "cwd"=>"/"}}' on node 4 valid, but failed

Note on a screenshot that controller node displayed as OFFLINE. But it`s actually online and can be accessed via its VirtualBox window. Screenshot attached.
Diagnostic snapshot:
https://drive.google.com/file/d/0B6E70aHvCcRQS2hVcGVxbmNjcm8/view?usp=sharing

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :
Revision history for this message
Ryan Moe (rmoe) wrote :

The last thing that happened on node-4 was "Executing '/sbin/ip --force link set dev eth0 down'" in the puppet log. After that point all remote logging stopped for node-4 and it was no longer able to connect to the master node. Can you provide the puppet log from the host itself? It seems like that interface never came back up.

Changed in fuel:
importance: Undecided → High
milestone: none → 6.1
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
Changed in fuel:
status: New → Confirmed
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Sergey Vasilenko (xenolog)
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Have not seen this been reproduced anymore. I am moving this to incomplete status

Changed in fuel:
status: Confirmed → Incomplete
Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.