Deployment fails with mcollective upload_file agent error
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Invalid
|
High
|
Sergey Vasilenko |
Bug Description
Steps:
Attempted to deploy configuration:
1 controller
1 compute
1 ceph osd (replication factor = 1)
CentOS
FlatDHCP, all network settings by default.
No additional components installed.
QEMU
Ceph RDB for volumes and images
# cat /etc/fuel/
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
api: "1.0"
build_number: "200"
build_id: "2015-03-
nailgun_sha: "713e6684f9f54e
python-
astute_sha: "93e427ac49109f
fuellib_sha: "553cb0cffa40a5
ostf_sha: "e86c961ceacfa5
fuelmain_sha: "c97fd8a789645b
Astute logs:
2015-03-17 15:30:25 ERR
[418] Error running RPC method granular_deploy: Failed to execute hook .
---
priority: 100
type: upload_file
uids:
- '4'
- '5'
- '6'
parameters:
path: "/etc/hiera/
data: |
nodes:
- {fqdn: node-4.domain.tld, internal_address: 192.168.0.3, internal_netmask: 255.255.255.0,
name: node-4, public_address: 172.16.0.4, public_netmask: 255.255.255.0, role: primary-controller,
storage_
user_
- {fqdn: node-5.domain.tld, internal_address: 192.168.0.4, internal_netmask: 255.255.255.0,
name: node-5, public_address: 172.16.0.5, public_netmask: 255.255.255.0, role: compute,
storage_
user_
- {fqdn: node-6.domain.tld, internal_address: 192.168.0.5, internal_netmask: 255.255.255.0,
name: node-6, public_address: 172.16.0.6, public_netmask: 255.255.255.0, role: ceph-osd,
storage_
user_
content: |
nodes:
- {fqdn: node-4.domain.tld, internal_address: 192.168.0.3, internal_netmask: 255.255.255.0,
name: node-4, public_address: 172.16.0.4, public_netmask: 255.255.255.0, role: primary-controller,
storage_
user_
- {fqdn: node-5.domain.tld, internal_address: 192.168.0.4, internal_netmask: 255.255.255.0,
name: node-5, public_address: 172.16.0.5, public_netmask: 255.255.255.0, role: compute,
storage_
user_
- {fqdn: node-6.domain.tld, internal_address: 192.168.0.5, internal_netmask: 255.255.255.0,
name: node-6, public_address: 172.16.0.6, public_netmask: 255.255.255.0, role: ceph-osd,
storage_
user_
overwrite: true
parents: true
permissions: '0644'
user_owner: root
group_owner: root
dir_permissions: '0755'
, trace:
["/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
2015-03-17 15:30:25 ERR
[418] 6998416e-
2015-03-17 15:30:25 ERR
[418] MCollective agents '4' didn't respond within the allotted time.
2015-03-17 15:18:59 ERR
[418] No more tasks will be executed on the node 4
2015-03-17 15:18:59 ERR
[418] Task '{"priority"=>600, "type"=>"puppet", "uids"=>["4"], "parameters"
Note on a screenshot that controller node displayed as OFFLINE. But it`s actually online and can be accessed via its VirtualBox window. Screenshot attached.
Diagnostic snapshot:
https:/
Changed in fuel: | |
importance: | Undecided → High |
milestone: | none → 6.1 |
Changed in fuel: | |
assignee: | nobody → Fuel Library Team (fuel-library) |
Changed in fuel: | |
status: | New → Confirmed |
Changed in fuel: | |
assignee: | Fuel Library Team (fuel-library) → Sergey Vasilenko (xenolog) |
Changed in fuel: | |
status: | Incomplete → Invalid |
The last thing that happened on node-4 was "Executing '/sbin/ip --force link set dev eth0 down'" in the puppet log. After that point all remote logging stopped for node-4 and it was no longer able to connect to the master node. Can you provide the puppet log from the host itself? It seems like that interface never came back up.