Comment 6 for bug 1895248

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Picking logs from the CI job [0] discussed last Friday on #openstack-nova, here is what I have:
(tl;dr I would recommend to align/reduce the numbers of octavia/nova service workers and tempest concurrency)

> the test in that scenario seems to start at 21:25:37 and end at 21:36:29, and yeah, haproxy seems to report things being up again at 21:36:17

Failed haproxy requests:
Sep 16 21:24:07 standalone haproxy[12]: 192.168.24.3:35368 [16/Sep/2021:21:23:07.014] nova_osapi nova_osapi/standalone.ctlplane.localdomain 0/0/0/60057/60057 504 398 - - ---- 55/3/2/3/0 0/0 "POST /v2.1/servers/f5ef03d8-0a95-48c1-9a78-107aba1a6ad8/os-interface HTTP/1.1"
Sep 16 21:25:07 standalone haproxy[12]: 192.168.24.3:35580 [16/Sep/2021:21:23:19.295] nova_osapi nova_osapi/standalone.ctlplane.localdomain 0/0/0/108549/108549 504 398 - - ---- 59/1/0/1/0 0/0 "POST /v2.1/servers/52249295-f031-4f53-b403-86577a6a6e01/os-interface HTTP/1.1"
Sep 16 21:36:17 standalone haproxy[12]: 192.168.24.3:45986 [16/Sep/2021:21:35:17.112] nova_osapi nova_osapi/standalone.ctlplane.localdomain 0/0/0/60065/60065 504 398 - - ---- 57/2/1/2/0 0/0 "POST /v2.1/servers/6c3aec72-e03f-4d28-82c9-94c31bf3b223/os-interface HTTP/1.1"

the numbers 55/3/2/3/0 59/1/0/1/0 57/2/1/2/0 show the high total number of concurrent connections on the HAProxy process (55-59), and look pretty same high for all many logged requests there. Perhaps too much, compared to the configured nova vs octavia workers counts: 4 for octavia [1], 1 for nova [2], 2 for tempest concurrency [3] (see "Worker Balance" in the bottom)?

The numbers 0/0/0/60057/60057 0/0/0/108549/108549 0/0/0/60065/60065 confirm that with long server responce times of 60-108s. So the testing setup is incorrect and causes too much of queuing of API requests, which some times time out with 504 (looks expected to me).

I would recommend to align/reduce the numbers of service workers and tempest workers. There is WIP patches that attempted to address related things in upstream CI [4], and also I've added a parameter for Octavia workers in t-h-t [5].

[0] https://8d8d2855981635b8fdb4-a26efb96c1d036a9f9dde78212997c1f.ssl.cf1.rackcdn.com/808215/14/check/tripleo-ci-centos-8-scenario010-standalone/af88c0d/logs/undercloud/var/log/containers/haproxy/haproxy.log
[1] https://8d8d2855981635b8fdb4-a26efb96c1d036a9f9dde78212997c1f.ssl.cf1.rackcdn.com/808215/14/check/tripleo-ci-centos-8-scenario010-standalone/af88c0d/logs/undercloud/var/lib/config-data/puppet-generated/octavia/etc/httpd/conf.d/10-octavia_wsgi.conf
[2] https://8d8d2855981635b8fdb4-a26efb96c1d036a9f9dde78212997c1f.ssl.cf1.rackcdn.com/808215/14/check/tripleo-ci-centos-8-scenario010-standalone/af88c0d/logs/undercloud/var/lib/config-data/puppet-generated/nova/etc/httpd/conf.d/10-nova_api_wsgi.conf
[3] https://8d8d2855981635b8fdb4-a26efb96c1d036a9f9dde78212997c1f.ssl.cf1.rackcdn.com/808215/14/check/tripleo-ci-centos-8-scenario010-standalone/af88c0d/logs/undercloud/var/log/tempest/tempest_run.log
[4] https://review.opendev.org/q/topic:%22workers%22
[5] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/809988