Has been reproduced with the following configuration:
Baremetal,Ubuntu,IBP,HA, Neutron-gre,Ceph-all,Nova-debug,Nova-quotas, 6.1-522
Controllers:3 Computes+Ceph_OSD:47
rabbitmq on node-7 was broken during rally tests:
from <email address hidden>:
=ERROR REPORT==== 18-Jun-2015::15:52:10 ===
AMQP connection <0.21623.582> (running), channel 0 - error:
{amqp_error,connection_forced,
"broker forced connection closure with reason 'shutdown'",none}
from pacemaker.log:
2015-06-18T15:51:44.477300+00:00 err: ERROR: p_rabbitmq-server: get_monitor(): rabbitmqctl is not responding. The resource is failed.
ctl hanged and restarted
The following rally scenarios were running at the moment:
boot_server_with_network_in_single_tenant - a lot of VMs (2000) booting
create-and-delete-volume - cinder test
create_and_list_routers - neutron test
resize_server - nova test
From Bogdan Dobrelya
node-88 was elected as a new master, so it was also down from 18T15:52:15 to 18T15:56:35. Normally at least one node should be alive, but it was restarted as well because of ctl hanged on two nodes including the one supposed to be alive. as a result cluster was recovered, but app failed to survive recovery procedure
Has been reproduced with the following configuration: Ubuntu, IBP,HA, Neutron- gre,Ceph- all,Nova- debug,Nova- quotas, 6.1-522 Ceph_OSD: 47
Baremetal,
Controllers:3 Computes+
rabbitmq on node-7 was broken during rally tests: 2015::15: 52:10 === connection_ forced, 18T15:51: 44.477300+ 00:00 err: ERROR: p_rabbitmq-server: get_monitor(): rabbitmqctl is not responding. The resource is failed.
from <email address hidden>:
=ERROR REPORT==== 18-Jun-
AMQP connection <0.21623.582> (running), channel 0 - error:
{amqp_error,
"broker forced connection closure with reason 'shutdown'",none}
from pacemaker.log:
2015-06-
ctl hanged and restarted
atop -r /var/log/ atop/atop_ 20150618- 20150619 -b "15:52" paste.openstack .org/show/ 304479/
http://
The following rally scenarios were running at the moment: with_network_ in_single_ tenant - a lot of VMs (2000) booting and-delete- volume - cinder test and_list_ routers - neutron test
boot_server_
create-
create_
resize_server - nova test
From Bogdan Dobrelya
node-88 was elected as a new master, so it was also down from 18T15:52:15 to 18T15:56:35. Normally at least one node should be alive, but it was restarted as well because of ctl hanged on two nodes including the one supposed to be alive. as a result cluster was recovered, but app failed to survive recovery procedure
Diagnostic Snapshpot is here:http:// mos-scale- share.mirantis. com/fuel- snapshot- 2015-06- 19_10-52- 44.tar. xz