I finally managed to reproduce it on the latest 5.1 ISO (#263) { "api": "1.0", "astute_sha": "694b5a55695e01e1c42185bfac9cc7a641a9bd48", "build_id": "2014-06-23_00-31-14", "build_number": "265", "fuellib_sha": "dc2713b3ba20ccff2816cf61e8481fe2f17ed69b", "fuelmain_sha": "4394ca9be6540d652cc3919556633d9381e0db64", "mirantis": "yes", "nailgun_sha": "eaabb2c9bbe8e921aaa231960dcda74a7bc86213", "ostf_sha": "429c373fb79b1073aa336bc62c6aad45a8f93af6", "production": "docker", "release": "5.1" } The problem is caused by rabbitmq glitch on one of the remaining controller nodes (in my case node-2, after bringing down br-mgmt on node-1). Here is an example of nova-compute log: 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db Traceback (most recent call last): 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/servicegroup/drivers/db.py", line 95, in _report_state 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db service.service_ref, state_catalog) 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/conductor/api.py", line 218, in service_update 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db return self._manager.service_update(context, service, values) 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/conductor/rpcapi.py", line 330, in service_update 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db service=service_p, values=values) 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/client.py", line 150, in call 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db wait_for_reply=True, timeout=timeout) 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/oslo/messaging/transport.py", line 90, in _send 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db timeout=timeout) 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 409, in send 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db return self._send(target, ctxt, message, wait_for_reply, timeout) 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 400, in _send 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db result = self._waiter.wait(msg_id, timeout) 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 267, in wait 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db reply, ending = self._poll_connection(msg_id, timeout) 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 217, in _poll_connection 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db % msg_id) 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db MessagingTimeout: Timed out waiting for a reply to message ID a735198df0b94436801231af311adb99 2014-06-23 11:31:49.244 25809 TRACE nova.servicegroup.drivers.db According to tcpdump and logs, such errors occured only when "nova-compute" tried to send message to rabbitmq on node-2. Messages to rabbitmq on node-4 were fine. Rabbitmq was accepting connections on node-2 but it looks like it was not able to handle messages. Due to this "nova-compute" services were going "up" and "down" all the time in "nova service-list". Also all instances created in Horizon were in ERROR state. Rabbitmq on node-2 even failed to stop nicely via "service rabbitmq stop" and I had to kill it. After killing problem rabbitmq on node-2 (when only one working rabbitmq left on node-4), nova-compute services successfully recovered and I was able to create instances and pass OSTF. This intermittent bug should be fixed with https://blueprints.launchpad.net/fuel/+spec/rabbitmq-cluster-controlled-by-pacemaker. Attaching snapshot just in case