This particular issue most possible was caused by huge size of a message that was casted from astute to nailgun.
Scenario of the issue:
1) Deploy cluster
2) When waiting timeout was reached in system test (cluster deploing was still in progress!) , system test started task for making diagnostic snapshot (time: 04:20)
3) Diagnostic snapshot was successfully created, and then message was casted to Nailgun (>4Mb of size, see the attach) (time: 04:25:20)
4) RabbitMQ closed the connection (time: 04:25:24)
5) Astute, while processing the deploy that was still in progress, reconnected to RabbitMQ and continue deploy:
## astute.log
2015-09-03T04:25:26 warning: [679] Trying to reconnect to message broker. Retry 5 sec later...
Bug https:/ /bugs.launchpad .net/fuel/ +bug/1491375 is reproduced many times on almost half of 7.0.swarm jobs.
For example, CI job: /product- ci.infra. mirantis. net/view/ 7.0_swarm/ job/7.0. system_ test.ubuntu. ha_destructive_ ceph_neutron/ 45/console
https:/
##### rabbitmq log:
=ERROR REPORT==== 3-Sep-2015: :04:25: 24 === timeout, running}
closing AMQP connection <0.345.0> (10.109.0.2:40620 -> 10.109.0.2:5672):
{heartbeat_
=INFO REPORT==== 3-Sep-2015: :04:25: 31 ===
accepting AMQP connection <0.2438.0> (10.109.0.2:44064 -> 10.109.0.2:5672)
################
This particular issue most possible was caused by huge size of a message that was casted from astute to nailgun.
Scenario of the issue:
1) Deploy cluster
2) When waiting timeout was reached in system test (cluster deploing was still in progress!) , system test started task for making diagnostic snapshot (time: 04:20)
3) Diagnostic snapshot was successfully created, and then message was casted to Nailgun (>4Mb of size, see the attach) (time: 04:25:20)
4) RabbitMQ closed the connection (time: 04:25:24)
5) Astute, while processing the deploy that was still in progress, reconnected to RabbitMQ and continue deploy:
## astute.log
2015-09-03T04:25:26 warning: [679] Trying to reconnect to message broker. Retry 5 sec later...