Comment 5 for bug 1340989

Revision history for this message
Aleksandr Didenko (adidenko) wrote :

Some additional research on newer ISO:
{
    "api": "1.0",
    "astute_sha": "fd9b8e3b6f59b2727b1b037054f10e0dd7bd37f1",
    "auth_required": false,
    "build_id": "2014-07-21_10-32-30",
    "build_number": "340",
    "feature_groups": [
        "mirantis"
    ],
    "fuellib_sha": "1ec799bc6c8b08b8c9c6243c426507cb7a46459b",
    "fuelmain_sha": "539a5bf7493a5d14690a34bb18c3ad1c75b4f37f",
    "nailgun_sha": "bdd0bdec2b45eea843d559b7648bd5dca4873c66",
    "ostf_sha": "9863db951a6e159f4fa6e6861c8331e1af069cf8",
    "production": "docker",
    "release": "5.1"
}

After hard reset of all nodes it took ~30 minutes to finish bringing cluster back to one peace (crm to stop reporting offline nodes, load to go back to normal, etc). During those 30 minutes there was high load due to "software interrupts" which was even casuing packer loss accordinf to ping. So it finally ended up with the following errors in crm:

 vip__management_old (ocf::mirantis:ns_IPaddr2): Started (unmanaged) FAILED [ node-3 node-2 ]
 vip__public_old (ocf::mirantis:ns_IPaddr2): Started node-3 (unmanaged) FAILED

netns-es were OK:

root@node-1:~# ip netns
qrouter-7fa8e3cf-5e67-48eb-852a-cb0a57006f79
haproxy

root@node-2:~# ip netns
qdhcp-16b365ed-b181-49c9-a8a2-0e69bae788ff
haproxy

After "crm resource cleanup" for those problem vip resources, env got back to operating state. It was able to successfully pass OSTF, except "RabbitMQ availability" test. After "crm resource restart master_p_rabbitmq-server" rabbit got back to normal state as well.

Attaching snapshot, additional info with pacemaker logs and rabbit statuses to follow.