On 2011-12-07, we had a RabbitMQ outage which degraded into cascading failures in the Launchpad app servers.
The working theory is that a really big OOPS triggered the high VM watermark error in Rabbit:
=INFO REPORT==== 7-Dec-2011::21:06:44 === alarm_handler: {set,{vm_memory_high_watermark,[]}}
Shortly after that, app servers stopped responding to Nagios checks (10s socket timeout).
We aren't probably handling that Rabbit error very well.
On 2011-12-07, we had a RabbitMQ outage which degraded into cascading failures in the Launchpad app servers.
The working theory is that a really big OOPS triggered the high VM watermark error in Rabbit:
=INFO REPORT==== 7-Dec-2011: :21:06: 44 === alarm_handler: {set,{vm_ memory_ high_watermark, []}}
Shortly after that, app servers stopped responding to Nagios checks (10s socket timeout).
We aren't probably handling that Rabbit error very well.