Comment 7 for bug 1289200

Revision history for this message
Andrew Woodward (xarses) wrote :

@bogdando via https://bugs.launchpad.net/fuel/+bug/1317488
Summary:
Fuel should provide TCP KA (keepalives) for rabitmq sessions in HA mode.
These TCP KA should be visible at the app layer as well as at the network stack layer.

related Oslo.messaging issue: https://bugs.launchpad.net/oslo.messaging/+bug/856764
related fuel-dev ML: https://lists.launchpad.net/fuel-dev/msg01024.html

Issues we have in the Fuel:
1) In 5.0 we upgraded rabbit up to 3.x and moved its connections management out of the HAproxy scope for most of the Openstack services (those ones who have synced rabbit_hosts support from Oslo.messages). ( Was also backported for 4.1.1)
Hence, we still have to provide a TCP KA for rabbitmq sessions in order to make Fuel HA arch more reliable.

2) Anyway, HAproxy provides TCP KA only for network layer, see in the docs:
"It is important to understand that keep-alive packets are neither emitted nor
  received at the application level. It is only the network stacks which sees
  them. For this reason, even if one side of the proxy already uses keep-alives
  to maintain its connection alive, those keep-alive packets will not be
  forwarded to the other side of the proxy."

3)We have it configured in the wrong way, see HAproxy docs:
"Using option "tcpka" enables the emission of TCP keep-alive probes on both
  the client and server sides of a connection. Note that this is meaningful
  only in "defaults" or "listen" sections. If this option is used in a
  frontend, only the client side will get keep-alives, and if this option is
  used in a backend, only the server side will get keep-alives. For this
  reason, it is strongly recommended to explicitly use "option clitcpka" and
  "option srvtcpka" when the configuration is split between frontends and
  backends."

Suggested solution:
Apply all patches from #856764 for Nova in MOS packages and test the RabbitMQ connections thoroughly. If it looks OK, sync the patches for other MOS packages.
Perhaps, this issue should be fixed in 5.1 but backporting should be considered as a critical for 4.1.1 release (due to the increasing number of existing tickets in zendesk) and as High for 5.1. I hope, the 5.0 backport is not needed due to the option to roll an upgrade 5.0 -> 5.1 would be existing.