Comment 10 for bug 1436769

Revision history for this message
Sean Dague (sdague) wrote :

From irc logs this morning:

<sdague> I don't know, what's your opinion of the risk of finding other issues here?
<-- sreshetnyak has quit (Ping timeout: 250 seconds)
<sileht> sdague, heartbeat or not, the really complains that if something bad occurs on the communication between oslo.msg and rabbit, we never want to wait the the system tcp timeout to discover it
<sileht> sdague, many people think that implementing heartbeat was fixing the issue
<sileht> sdague, but that not true, to really fix that we have to be able the set timeout on socket
<sileht> sdague, and THEN use heartbeat to use very long timeout
<sdague> ok, that's fair
<sdague> so what's your opinion on the safest option for the kilo release to keep regressions to a minimum?
<sileht> sdague, but kombu doesn't allow us to do that on the socket used for write
<sdague> should we bump the requirement?
<sdague> or turn off hearbeats by default?
<sileht> perhaps turn off hearbeats because the read timeout is still forced to a maximun of 1sec to ensure we detect connection lost on read
<sdague> sileht: ok, so that would be a 1.8.2 with the default behavior changed?
<sileht> on write, heartbeat or not, we can still have issue, if the packet that fail to transfert is the heartbeat one
--> inc0 (~inc0@134.134.137.73) has joined #openstack-oslo
<sileht> sdague, but on the other side ops really really want this feature
<sileht> sdague, even it's not yet perfect yet
<sdague> well, they can turn it back on
<sileht> true
<sdague> call it experimental because there are known issues, which is why it defaults off
<sdague> but explain how it can be enabled if people want to try it
<sileht> sdague, that soon reasonable
<sileht> sdague, I will proposed a patch and see other core dev opinion

So the proposed fix is to turn off heartbeats by default, which will require a 1.8.2 and 1.9.1 release.

In addition it seems prudent to have oslo.messaging versions that support heartbeats to require amqp >= 1.4.0, which addresses a critical bug here.

All of that really needs to be in place for the kilo release.