Steps to reproduce:
Scenario:
1. Create cluster in Ha mode with 1 controller
2. Add 1 node with controller role
3. Add 1 node with compute role
4. Deploy cluster
Expected result:
Deploy pass
Actual result:
Deploy fails
At the Astute log:
2015-06-03T07:33:03 info: [663] Casting message to Nailgun: {"method"=>"deploy_resp", "args"=>{"task_uuid"=>"a20646c6-40e0-4d8d-b3a5-c123ce4cda1d", "status"=>"ready", "progress"=>100}}
2015-06-03T07:33:03 warning: [663] Trying to reconnect to message broker. Retry 5 sec later...
At the rabbit log:
=ERROR REPORT==== 3-Jun-2015::07:31:33 ===
closing AMQP connection <0.350.0> (10.109.15.2:39783 -> 10.109.15.2:5672):
{heartbeat_timeout,running}
=INFO REPORT==== 3-Jun-2015::07:33:08 ===
accepting AMQP connection <0.1203.0> (10.109.15.2:40972 -> 10.109.15.2:5672)
Fuel used:
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "497"
build_id: "2015-06-02_16-28-25"
nailgun_sha: "3830bdcb28ec050eed399fe782cc3dd5fbf31bde"
python-fuelclient_sha: "4fc55db0265bbf39c369df398b9dc7d6469ba13b"
astute_sha: "cbae24e9904be2ff8d1d49c0c48d1bdc33574228"
fuel-library_sha: "d757cd41e4f8273d36ef85b8207e554e5422c5b0"
fuel-ostf_sha: "f899e16c4ce9a60f94e7128ecde1324ea41d09d4"
fuelmain_sha: "bcc909ffc5dd5156ba54cae348b6a07c1b607b24"
According to the logs, astute keeps casting messages to deploy_resp topic in RabbitMQ, but they never get collected by receiverd.
One suggestion could be to kill receiverd if it can't connect (and supervisor restarts it).
See there are only a couple of messages received by provision_resp, but no deploy_resp messages.