Comment 0 for bug 1462438

Mark Vanderwiel (vanderwl) wrote :

When using rabbit mq in multi controller environment (HA), if a controller node (with rabbitmq service) goes down, it can take a a long time (15 minutes) for compute nodes to automatically recover and begin using the alternate rabbitmq service. There existing heartbeat configuration options in oslo rabbitmq support to better handle these situations.

https://github.com/openstack/oslo.messaging/blob/d685e6f80a5dfc5fba638beacde762c8ccf9a89d/oslo_messaging/_drivers/impl_rabbit.py#L138

cfg.IntOpt('heartbeat_timeout_threshold',
default=0,
help="Number of seconds after which the Rabbit broker is "
"considered down if heartbeat's keep-alive fails "
"(0 disable the heartbeat). EXPERIMENTAL"),

cfg.IntOpt('heartbeat_rate',
default=2,
help='How often times during the heartbeat_timeout_threshold '
'we check the heartbeat.'),

These will be added to Common messaging attributes, and then to all the cookbook that uses rabbitmq.