Picking logs from the CI job [0] discussed last Friday on #openstack-nova, here is what I have:
(tl;dr I would recommend to align/reduce the numbers of octavia/nova service workers and tempest concurrency)
> the test in that scenario seems to start at 21:25:37 and end at 21:36:29, and yeah, haproxy seems to report things being up again at 21:36:17
the numbers 55/3/2/3/0 59/1/0/1/0 57/2/1/2/0 show the high total number of concurrent connections on the HAProxy process (55-59), and look pretty same high for all many logged requests there. Perhaps too much, compared to the configured nova vs octavia workers counts: 4 for octavia [1], 1 for nova [2], 2 for tempest concurrency [3] (see "Worker Balance" in the bottom)?
The numbers 0/0/0/60057/60057 0/0/0/108549/108549 0/0/0/60065/60065 confirm that with long server responce times of 60-108s. So the testing setup is incorrect and causes too much of queuing of API requests, which some times time out with 504 (looks expected to me).
I would recommend to align/reduce the numbers of service workers and tempest workers. There is WIP patches that attempted to address related things in upstream CI [4], and also I've added a parameter for Octavia workers in t-h-t [5].
Picking logs from the CI job [0] discussed last Friday on #openstack-nova, here is what I have:
(tl;dr I would recommend to align/reduce the numbers of octavia/nova service workers and tempest concurrency)
> the test in that scenario seems to start at 21:25:37 and end at 21:36:29, and yeah, haproxy seems to report things being up again at 21:36:17
Failed haproxy requests: 2021:21: 23:07.014] nova_osapi nova_osapi/ standalone. ctlplane. localdomain 0/0/0/60057/60057 504 398 - - ---- 55/3/2/3/0 0/0 "POST /v2.1/servers/ f5ef03d8- 0a95-48c1- 9a78-107aba1a6a d8/os-interface HTTP/1.1" 2021:21: 23:19.295] nova_osapi nova_osapi/ standalone. ctlplane. localdomain 0/0/0/108549/108549 504 398 - - ---- 59/1/0/1/0 0/0 "POST /v2.1/servers/ 52249295- f031-4f53- b403-86577a6a6e 01/os-interface HTTP/1.1" 2021:21: 35:17.112] nova_osapi nova_osapi/ standalone. ctlplane. localdomain 0/0/0/60065/60065 504 398 - - ---- 57/2/1/2/0 0/0 "POST /v2.1/servers/ 6c3aec72- e03f-4d28- 82c9-94c31bf3b2 23/os-interface HTTP/1.1"
Sep 16 21:24:07 standalone haproxy[12]: 192.168.24.3:35368 [16/Sep/
Sep 16 21:25:07 standalone haproxy[12]: 192.168.24.3:35580 [16/Sep/
Sep 16 21:36:17 standalone haproxy[12]: 192.168.24.3:45986 [16/Sep/
the numbers 55/3/2/3/0 59/1/0/1/0 57/2/1/2/0 show the high total number of concurrent connections on the HAProxy process (55-59), and look pretty same high for all many logged requests there. Perhaps too much, compared to the configured nova vs octavia workers counts: 4 for octavia [1], 1 for nova [2], 2 for tempest concurrency [3] (see "Worker Balance" in the bottom)?
The numbers 0/0/0/60057/60057 0/0/0/108549/108549 0/0/0/60065/60065 confirm that with long server responce times of 60-108s. So the testing setup is incorrect and causes too much of queuing of API requests, which some times time out with 504 (looks expected to me).
I would recommend to align/reduce the numbers of service workers and tempest workers. There is WIP patches that attempted to address related things in upstream CI [4], and also I've added a parameter for Octavia workers in t-h-t [5].
[0] https:/ /8d8d2855981635 b8fdb4- a26efb96c1d036a 9f9dde78212997c 1f.ssl. cf1.rackcdn. com/808215/ 14/check/ tripleo- ci-centos- 8-scenario010- standalone/ af88c0d/ logs/undercloud /var/log/ containers/ haproxy/ haproxy. log /8d8d2855981635 b8fdb4- a26efb96c1d036a 9f9dde78212997c 1f.ssl. cf1.rackcdn. com/808215/ 14/check/ tripleo- ci-centos- 8-scenario010- standalone/ af88c0d/ logs/undercloud /var/lib/ config- data/puppet- generated/ octavia/ etc/httpd/ conf.d/ 10-octavia_ wsgi.conf /8d8d2855981635 b8fdb4- a26efb96c1d036a 9f9dde78212997c 1f.ssl. cf1.rackcdn. com/808215/ 14/check/ tripleo- ci-centos- 8-scenario010- standalone/ af88c0d/ logs/undercloud /var/lib/ config- data/puppet- generated/ nova/etc/ httpd/conf. d/10-nova_ api_wsgi. conf /8d8d2855981635 b8fdb4- a26efb96c1d036a 9f9dde78212997c 1f.ssl. cf1.rackcdn. com/808215/ 14/check/ tripleo- ci-centos- 8-scenario010- standalone/ af88c0d/ logs/undercloud /var/log/ tempest/ tempest_ run.log /review. opendev. org/q/topic: %22workers% 22 /review. opendev. org/c/openstack /tripleo- heat-templates/ +/809988
[1] https:/
[2] https:/
[3] https:/
[4] https:/
[5] https:/