Comment 4 for bug 1389651

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

What's clear is that it's not clear at all what broke here, causing the compute node to go in error state. You saw that node-100 (compute_65) had an error, but none of the logs indicate this. All nodes provisioned okay with no issues and astute logs prove it. In fact, receiverd.log in nailgun has a message showing that the cluster was provisioned:
receiverd.log:2014-11-05 11:04:06.755 INFO [7f39aa25e700] (receiver) RPC method provision_resp received: {"status": "ready", "progress": 100, "task_uuid": "2801508b-ba51-4155-836b-e140dcf8d440", "nodes": [{"status": "provisioned", "progress": 100, "uid": "28"}]}

The deploy task did fail with something about glance image upload, but the root of this actual deploy failure is pacemaker service provider is broken:
err: (/Stage[main]/Rabbitmq::Service/Service[p_rabbitmq-server]) Provider pacemaker is not functional on this host

We really should reproduce this issue on a known good ISO where puppet failures can be ruled out.