After hard reset of all nodes it took ~30 minutes to finish bringing cluster back to one peace (crm to stop reporting offline nodes, load to go back to normal, etc). During those 30 minutes there was high load due to "software interrupts" which was even casuing packer loss accordinf to ping. So it finally ended up with the following errors in crm:
vip__management_old (ocf::mirantis:ns_IPaddr2): Started (unmanaged) FAILED [ node-3 node-2 ]
vip__public_old (ocf::mirantis:ns_IPaddr2): Started node-3 (unmanaged) FAILED
netns-es were OK:
root@node-1:~# ip netns
qrouter-7fa8e3cf-5e67-48eb-852a-cb0a57006f79
haproxy
root@node-2:~# ip netns
qdhcp-16b365ed-b181-49c9-a8a2-0e69bae788ff
haproxy
After "crm resource cleanup" for those problem vip resources, env got back to operating state. It was able to successfully pass OSTF, except "RabbitMQ availability" test. After "crm resource restart master_p_rabbitmq-server" rabbit got back to normal state as well.
Attaching snapshot, additional info with pacemaker logs and rabbit statuses to follow.
Some additional research on newer ISO: 727b1b037054f10 e0dd7bd37f1" , required" : false, 21_10-32- 30", groups" : [ b8c9c6243c42650 7cb7a46459b" , 14690a34bb18c3a d1c75b4f37f" , a843d559b7648bd 5dca4873c66" , 9f4fa6e6861c833 1e1af069cf8" ,
{
"api": "1.0",
"astute_sha": "fd9b8e3b6f59b2
"auth_
"build_id": "2014-07-
"build_number": "340",
"feature_
"mirantis"
],
"fuellib_sha": "1ec799bc6c8b08
"fuelmain_sha": "539a5bf7493a5d
"nailgun_sha": "bdd0bdec2b45ee
"ostf_sha": "9863db951a6e15
"production": "docker",
"release": "5.1"
}
After hard reset of all nodes it took ~30 minutes to finish bringing cluster back to one peace (crm to stop reporting offline nodes, load to go back to normal, etc). During those 30 minutes there was high load due to "software interrupts" which was even casuing packer loss accordinf to ping. So it finally ended up with the following errors in crm:
vip__managemen t_old (ocf::mirantis: ns_IPaddr2) : Started (unmanaged) FAILED [ node-3 node-2 ] ns_IPaddr2) : Started node-3 (unmanaged) FAILED
vip__public_old (ocf::mirantis:
netns-es were OK:
root@node-1:~# ip netns 7fa8e3cf- 5e67-48eb- 852a-cb0a57006f 79
qrouter-
haproxy
root@node-2:~# ip netns b181-49c9- a8a2-0e69bae788 ff
qdhcp-16b365ed-
haproxy
After "crm resource cleanup" for those problem vip resources, env got back to operating state. It was able to successfully pass OSTF, except "RabbitMQ availability" test. After "crm resource restart master_ p_rabbitmq- server" rabbit got back to normal state as well.
Attaching snapshot, additional info with pacemaker logs and rabbit statuses to follow.