Deployment fails, mcollective agents didn't respond within the allotted time

Bug #1445013 reported by Are Romøren
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Matthew Mosesohn
6.0.x
Invalid
High
Matthew Mosesohn
6.1.x
Invalid
High
Matthew Mosesohn

Bug Description

I've seen this bug reported elsewhere, but using different setups. Also, I cannot see that a fix has been published. I see some references to 6.0.x-releases, but the only download option I can find is directly from software.mirantis.com

Env:

* 6.0-iso from software.mirantis.com
* HA deployment on Ubuntu with Neutron vlan, Ceph, Murano and Ceilometer
* Deployment run with 3 controllers, 3 compute nodes and 3 MongoDB nodes

Deployment halts regularly at 35-40% on the progress bar, when one of the nodes being deployed is set to "offline" in the Fuel admin console. Astute logs state:

2015-04-16 13:07:14 ERR [416] 676976f6-87dd-4462-9551-cf531ee48d1c: cmd:
ruby -r 'yaml' -e 'y = YAML.load_file("/etc/astute.yaml");
y["nodes"] = YAML.load_file("/tmp/astute.yaml");
File.open("/etc/astute.yaml", "w") { |f| f.write y.to_yaml }';
puppet apply --logdest syslog --debug -e '$settings=parseyaml($::astute_settings_yaml) $nodes_hash=$settings["nodes"] class {"l23network::hosts_file": nodes => $nodes_hash }'
 mcollective error: 676976f6-87dd-4462-9551-cf531ee48d1c: MCollective agents '5' didn't respond within the allotted time.

2015-04-16 13:07:14 ERR [416] MCollective agents '5' didn't respond within the allotted time.
2015-04-16 13:05:13 ERR [416] 676976f6-87dd-4462-9551-cf531ee48d1c: mcollective upload_file agent error: 676976f6-87dd-4462-9551-cf531ee48d1c: MCollective agents '5' didn't respond within the allotted time.
2015-04-16 13:05:13 ERR [416] MCollective agents '5' didn't respond within the allotted time.

Deployment fails after a while with "Error: Deployment has failed. Check these nodes: $node"

Revision history for this message
Are Romøren (are-romoren) wrote :
Changed in fuel:
importance: Undecided → High
milestone: none → 6.0.2
status: New → Confirmed
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Most likely the time lost its sync. RabbitMQ, the transport backend for mcollective, requires the time to be properly synchronized. Check the differences in the output of the "date" command on Fuel Master as well as node-5. That will give you some pointers.
You should make sure you have an Internet connection on your Fuel Master and enable NTP in Fuel Setup to avoid issues like this. It becomes a bigger problem later on in deployment if there is no time synchronization.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.