Comment 22 for bug 1463433

Revision history for this message
Bogdan Dobrelya (bogdando) wrote : Re: [shaker] test failing due to multiple "Timed out waiting for reply to ID" events logged by Oslo.messaging after rabbitmq recovered from partitioning and kept running with AMQP publish got blocked because virt memory got exhausted at rabbit node

The latest reproduce effort with given patches https://review.openstack.org/190137 and https://review.openstack.org/189292 brought the following results.

Shaker run : from 22:06:00 to 02:23:05
light rally: from 02:23:05 to 02:53:16
full rally: from 02:53:16 till the end of logs ~09:00

- there were no partitions detected due to network or CPU spike loads
- there were no memory alerts set on rabbit nodes
- rabbitmqctl never hanged on nodes, but reported several nodedown errors which OCF considers as a resource failure and initiates restart of rabbit node.
- the complete list of single rabbit node failures during the test runs is http://pastebin.com/y12DDEx6

some logs and additional rabbit stats collected by this script http://pastebin.com/sX3DPyRG is attached