RabbitMQ should use net_ticktime

Bug #1717006 reported by John Eckersberg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
John Eckersberg

Bug Description

Currently, we override RABBITMQ_SERVER_ERL_ARGS in tripleo in order to set a blanket TCP timeout of 15 seconds on all connecting and listening erlang sockets.

This is leftover from long ago when oslo.messaging did not have proper heartbeat support. Without AMQP heartbeats, it was not possible to detect quickly when a client lost connection, so the TCP timeout was implemented.

Heartbeat support has been available in oslo.messaging for many releases now, so this is no longer required to detect dead clients and can be removed.

However, as a somewhat-intended side effect, setting the TCP timeouts also affects dead peer detection on the connections between RabbitMQ nodes in a clustered environment. Normally these timeouts are configured via the erlang net_ticktime mechanism, see https://www.rabbitmq.com/nettick.html for more info.

Using net_ticktime is preferrable to TCP timeouts. It is the standard practice, it is simpler, and also in testing seems to detect failures faster (see
https://bugzilla.redhat.com/show_bug.cgi?id=1485989).

Changed in tripleo:
assignee: nobody → John Eckersberg (jeckersb)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/503788

Changed in tripleo:
milestone: none → queens-1
importance: Undecided → Medium
Changed in tripleo:
milestone: queens-1 → queens-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/503788
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=962ce364f88eacf20507ba646c94a96d5c782001
Submitter: Zuul
Branch: master

commit 962ce364f88eacf20507ba646c94a96d5c782001
Author: John Eckersberg <email address hidden>
Date: Tue Sep 12 17:19:56 2017 -0400

    RabbitMQ should use net_ticktime

    We no longer need to force low-level TCP timeouts for dead client
    detection, but should continue tuning the timeout for dead peer
    detection between cluster nodes. Using the erlang net_ticktime option
    is preferrable here.

    Closes-Bug: 1717006
    Change-Id: Ibd29c03bd69818d79396c379a2d638c018a04b82

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/519459

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/pike)

Change abandoned by John Eckersberg (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/519459
Reason: This adds a new option, so shouldn't be backported.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 8.0.0.0b2

This issue was fixed in the openstack/tripleo-heat-templates 8.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.