tripleo

Overview
Code
Bugs
Blueprints
Translations
Answers

Bug #1385240
Comment #0

Comment 0 for bug 1385240

Revision history for this message

Giulio Fidente (gfidente) wrote on 2014-10-24:

Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api.

The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration.

This was observed using Kombu 3.0.33 as well as 2.5.

Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging