tripleo

RabbitMQ should use net_ticktime

Bug #1717006 reported by John Eckersberg on 2017-09-13

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Fix Released	Medium	John Eckersberg	tripleo queens-2

Bug Description

Currently, we override RABBITMQ_SERVER_ERL_ARGS in tripleo in order to set a blanket TCP timeout of 15 seconds on all connecting and listening erlang sockets.

This is leftover from long ago when oslo.messaging did not have proper heartbeat support. Without AMQP heartbeats, it was not possible to detect quickly when a client lost connection, so the TCP timeout was implemented.

Heartbeat support has been available in oslo.messaging for many releases now, so this is no longer required to detect dead clients and can be removed.

However, as a somewhat-intended side effect, setting the TCP timeouts also affects dead peer detection on the connections between RabbitMQ nodes in a clustered environment. Normally these timeouts are configured via the erlang net_ticktime mechanism, see https://www.rabbitmq.com/nettick.html for more info.

Using net_ticktime is preferrable to TCP timeouts. It is the standard practice, it is simpler, and also in testing seems to detect failures faster (see
https://bugzilla.redhat.com/show_bug.cgi?id=1485989).

John Eckersberg (jeckersb) on 2017-09-13

Changed in tripleo:
assignee:	nobody → John Eckersberg (jeckersb)
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-09-13: Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/503788

Emilien Macchi (emilienm) on 2017-09-14

Changed in tripleo:
milestone:	none → queens-1
importance:	Undecided → Medium

Emilien Macchi (emilienm) on 2017-10-23

Changed in tripleo:
milestone:	queens-1 → queens-2

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-11-12: Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/503788
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=962ce364f88eacf20507ba646c94a96d5c782001
Submitter: Zuul
Branch: master

commit 962ce364f88eacf20507ba646c94a96d5c782001
Author: John Eckersberg <email address hidden>
Date: Tue Sep 12 17:19:56 2017 -0400

RabbitMQ should use net_ticktime

    We no longer need to force low-level TCP timeouts for dead client
    detection, but should continue tuning the timeout for dead peer
    detection between cluster nodes. Using the erlang net_ticktime option
    is preferrable here.

Closes-Bug: 1717006
Change-Id: Ibd29c03bd69818d79396c379a2d638c018a04b82

Changed in tripleo:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-11-13: Fix proposed to tripleo-heat-templates (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/519459

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-11-14: Change abandoned on tripleo-heat-templates (stable/pike)

Change abandoned by John Eckersberg (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/519459
Reason: This adds a new option, so shouldn't be backported.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-04: Fix included in openstack/tripleo-heat-templates 8.0.0.0b2

This issue was fixed in the openstack/tripleo-heat-templates 8.0.0.0b2 development milestone.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.