Comment 29 for bug 1460762

Revision history for this message
Leontii Istomin (listomin) wrote :

Has been reproduced with the following configuration:
Baremetal,Ubuntu,IBP,HA, Neutron-gre,Ceph-all,Nova-debug,Nova-quotas, 6.1-522
Controllers:3 Computes+Ceph_OSD:47

rabbitmq on node-7 was broken during rally tests:
from <email address hidden>:
=ERROR REPORT==== 18-Jun-2015::15:52:10 ===
AMQP connection <0.21623.582> (running), channel 0 - error:
{amqp_error,connection_forced,
           "broker forced connection closure with reason 'shutdown'",none}
from pacemaker.log:
2015-06-18T15:51:44.477300+00:00 err: ERROR: p_rabbitmq-server: get_monitor(): rabbitmqctl is not responding. The resource is failed.
ctl hanged and restarted

atop -r /var/log/atop/atop_20150618-20150619 -b "15:52"
http://paste.openstack.org/show/304479/

The following rally scenarios were running at the moment:
boot_server_with_network_in_single_tenant - a lot of VMs (2000) booting
create-and-delete-volume - cinder test
create_and_list_routers - neutron test
resize_server - nova test

From Bogdan Dobrelya
node-88 was elected as a new master, so it was also down from 18T15:52:15 to 18T15:56:35. Normally at least one node should be alive, but it was restarted as well because of ctl hanged on two nodes including the one supposed to be alive. as a result cluster was recovered, but app failed to survive recovery procedure

Diagnostic Snapshpot is here:http://mos-scale-share.mirantis.com/fuel-snapshot-2015-06-19_10-52-44.tar.xz