HAProxy reports false down events for synchronous backends (blocking APIs)

Bug #1946326 reported by Bogdan Dobrelya
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Triaged
Medium
Unassigned

Bug Description

Source: https://bugs.launchpad.net/tripleo/+bug/1895248/comments/10

for such cases, we might want to tweak haproxy to avoid false down events, causing unnecessary HA failovers. That could by L5 send/expect scripts. OR instead try to make such backends smarter to reply its aliveness over L7 checks, even while in a blocking call..

If we leave that as is, for Nova example, concurrent POST /v2.1/servers/{server}/os-interface synchronous calls to n-api might eventually mark all of its backends down, what would look like a full cloud outage, while it is not. The number of such concurrent POST calls should exceed the number of nova API workers (usually is CPU count) multiplied by the number of server backends (usually 3), and also be long enough (longer than tcp-check timeout) to accomplish. While in the source bug the problem is in the unexpectedly long vif-plugging blocking calls, this issue targets "valid" long blocking calls in general.

Changed in tripleo:
status: New → Triaged
importance: Undecided → Medium
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.