Comment 0 for bug 1385240

Revision history for this message
Giulio Fidente (gfidente) wrote :

Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api.

The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration.

This was observed using Kombu 3.0.33 as well as 2.5.

Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging