Use leastcon and socket-level TCP keep-alives for Heat API
According to the HAProxy docs, when the protocol involves very long
sessions with long idle periods (eg: querying Heat API for large
resources), there is a risk that one of the intermediate components
decides to expire a session which has remained idle for too long.
In some NFV cases with hundreds of VM/port resources, multiple API
requests are being sent in parallel towards the Heat API service to
retrieve the OS::Nova::Server resources from an big Heat stack, and
this is causing the Heat API backends to be unavailable and requests
to fail. This also ends up with all of its backends considered down
by HAProxy, leaving the system in a cascading failure scenario:
xx:12:09 overcloud-ctrl-0 haproxy[13]: Server
heat_api/overcloud-ctrl-1.internalapi is DOWN, reason: Layer7 timeout,
check duration: 10001ms. 2 active and 0 backup servers left. 0
sessions active, 0 requeued, 0 remaining in queue.
xx:12:09 overcloud-ctrl-0 haproxy[13]: Server
heat_api/overcloud-ctrl-2.internalapi is DOWN, reason: Layer7 timeout,
check duration: 10001ms. 1 active and 0 backup servers left. 0
sessions active, 0 requeued, 0 remaining in queue.
xx:12:09 overcloud-ctrl-0 haproxy[13]: Server
heat_api/overcloud-ctrl-0.internalapi is DOWN, reason: Layer7 timeout,
check duration: 10001ms. 0 active and 0 backup servers left. 0
sessions active, 0 requeued, 0 remaining in queue.
xx:12:09 overcloud-ctrl-0 haproxy[13]: proxy heat_api has no server
available!
xx:13:55 overcloud-ctrl-0 haproxy[13]: Server
heat_api/overcloud-ctrl-1.internalapi is UP, reason: Layer7 check
passed, code: 200, info: "OK", check duration: 1ms. 1 active and 0
backup servers online. 0 sessions requeued, 0 total in queue.
xx:13:55 overcloud-ctrl-0 haproxy[13]: Server
heat_api/overcloud-ctrl-2.internalapi is UP, reason: Layer7 check
passed, code: 200, info: "OK", check duration: 2ms. 2 active and 0
backup servers online. 0 sessions requeued, 0 total in queue.
xx:13:56 overcloud-ctrl-0 haproxy[13]: Server
heat_api/overcloud-ctrl-0.internalapi is UP, reason: Layer7 check
passed, code: 200, info: "OK", check duration: 1ms. 3 active and 0
backup servers online. 0 sessions requeued, 0 total in queue.
Mitigation steps proposed:
* Enabling socket-level TCP keep-alives makes the system regularly send
packets to the other end of the connection, leaving it active.
* tl;dr - round-robin LB does not fit scenarios with cascading
failures. Enabling leastcon LB makes the cascading failure to happen
less likely, when high numbers of client connections become aligned by
real counts instead of the numbers-unaware round-robin rotation.
* the default balance algorithm for Heat API therefore becomes
'leastconn' instead of 'roundrobin' (this is controlled by a new
parameter).
Cascading failures (when backends go down one-by-one) result in unfair
distribution of load, consider the following example:
- do round-robin of a 100 connections amongst 3 backends (normal operation)
-> 34/33/33,
- ... another 100 but among only 2 (a 3-1 failure) -> 84/83/-,
- ... another 100 in a cascading failure -> 184/-/-,
- ... +100, after one more gets recovered -> 214/33/-
- ... +100, after all recovered -> 244/63/33
(repeat until all goes down after the 1st backend takes enormous
number of connections)
Reviewed: https:/ /review. opendev. org/735541 /git.openstack. org/cgit/ openstack/ puppet- tripleo/ commit/ ?id=5717bd79525 e2eb5b14c2d365d d59b63d7a63066
Committed: https:/
Submitter: Zuul
Branch: master
commit 5717bd79525e2eb 5b14c2d365dd59b 63d7a63066
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Jun 15 11:05:06 2020 +0200
Use leastcon and socket-level TCP keep-alives for Heat API
According to the HAProxy docs, when the protocol involves very long
sessions with long idle periods (eg: querying Heat API for large
resources), there is a risk that one of the intermediate components
decides to expire a session which has remained idle for too long.
In some NFV cases with hundreds of VM/port resources, multiple API
requests are being sent in parallel towards the Heat API service to
retrieve the OS::Nova::Server resources from an big Heat stack, and
this is causing the Heat API backends to be unavailable and requests
to fail. This also ends up with all of its backends considered down
by HAProxy, leaving the system in a cascading failure scenario:
xx:12:09 overcloud-ctrl-0 haproxy[13]: Server api/overcloud- ctrl-1. internalapi is DOWN, reason: Layer7 timeout,
heat_
check duration: 10001ms. 2 active and 0 backup servers left. 0
sessions active, 0 requeued, 0 remaining in queue.
xx:12:09 overcloud-ctrl-0 haproxy[13]: Server api/overcloud- ctrl-2. internalapi is DOWN, reason: Layer7 timeout,
heat_
check duration: 10001ms. 1 active and 0 backup servers left. 0
sessions active, 0 requeued, 0 remaining in queue.
xx:12:09 overcloud-ctrl-0 haproxy[13]: Server api/overcloud- ctrl-0. internalapi is DOWN, reason: Layer7 timeout,
heat_
check duration: 10001ms. 0 active and 0 backup servers left. 0
sessions active, 0 requeued, 0 remaining in queue.
xx:12:09 overcloud-ctrl-0 haproxy[13]: proxy heat_api has no server
available!
xx:13:55 overcloud-ctrl-0 haproxy[13]: Server api/overcloud- ctrl-1. internalapi is UP, reason: Layer7 check
heat_
passed, code: 200, info: "OK", check duration: 1ms. 1 active and 0
backup servers online. 0 sessions requeued, 0 total in queue.
xx:13:55 overcloud-ctrl-0 haproxy[13]: Server api/overcloud- ctrl-2. internalapi is UP, reason: Layer7 check
heat_
passed, code: 200, info: "OK", check duration: 2ms. 2 active and 0
backup servers online. 0 sessions requeued, 0 total in queue.
xx:13:56 overcloud-ctrl-0 haproxy[13]: Server api/overcloud- ctrl-0. internalapi is UP, reason: Layer7 check
heat_
passed, code: 200, info: "OK", check duration: 1ms. 3 active and 0
backup servers online. 0 sessions requeued, 0 total in queue.
Mitigation steps proposed:
* Enabling socket-level TCP keep-alives makes the system regularly send
packets to the other end of the connection, leaving it active.
* tl;dr - round-robin LB does not fit scenarios with cascading
failures. Enabling leastcon LB makes the cascading failure to happen
less likely, when high numbers of client connections become aligned by
real counts instead of the numbers-unaware round-robin rotation.
* the default balance algorithm for Heat API therefore becomes
'leastconn' instead of 'roundrobin' (this is controlled by a new
parameter).
Cascading failures (when backends go down one-by-one) result in unfair
distribution of load, consider the following example:
- do round-robin of a 100 connections amongst 3 backends (normal operation)
-> 34/33/33,
- ... another 100 but among only 2 (a 3-1 failure) -> 84/83/-,
- ... another 100 in a cascading failure -> 184/-/-,
- ... +100, after one more gets recovered -> 214/33/-
- ... +100, after all recovered -> 244/63/33
(repeat until all goes down after the 1st backend takes enormous
number of connections)
Partial-Bug: #1882927
Change-Id: I5b85675c97a899 b94c78ba9e19865 a156e054fcb