periodic featureset 35 wallaby times out running tempest (2 hours)

Bug #1939023 reported by Marios Andreou
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

At [1][2][3][4] the periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby times out during the tempest execution. As can be seen in the trace (from [1]) tempest starts executing ~2 hours before the timeout occurs:

2021-08-05 00:28:05.041522 | primary | TASK [os_tempest : Execute tempest tests] **************************************
2021-08-05 00:28:05.041528 | primary | Thursday 05 August 2021 00:28:05 +0000 (0:00:00.048) 1:50:45.136 *******
2021-08-05 02:22:27.036537 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/openstack/tripleo-ci/playbooks/tripleo-ci/run-v3.yaml@master]

Cant quickly see something useful from the tempest run logs [5] and tempestconf looks to have completed OK [6]

[1] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/193b290/job-output.txt
[2] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/ba0fa96/job-output.txt
[3] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/7198d64/job-output.txt
[4] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/3d8c715/job-output.txt
[5] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/193b290/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz
[6] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/193b290/logs/undercloud/var/log/tempest/tempestconf.log.txt.gz

Revision history for this message
Marios Andreou (marios-b) wrote (last edit ):

As can be seen in the attached screen shot from [1] the successful runs on this job are usually closer to ~3 hours. The timeouts started on 3rd August.

We *are* running a lot of tempest tests here [2] but that list of tests has not been altered recently and used to complete well within timeout.

Comparing to a green run at [3] the tempest tests usually take ~ 1hour to run:

        * 2021-08-03 00:18:37.735157 | primary | TASK [os_tempest : Execute tempest tests] **************************************
   2021-08-03 00:18:37.735168 | primary | Tuesday 03 August 2021 00:18:37 +0000 (0:00:00.041) 1:39:33.632 ********
   2021-08-03 01:16:39.798815 | primary | ok: [undercloud]

but as per this bug they are now timing out after 2 hours.

[1] https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby
[2] https://github.com/openstack/tripleo-quickstart/blob/444fcff6b17b77778382cd0be5a45f7b85a7b7ca/config/general_config/featureset035.yml#L175-L179
[3] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/d385cc6/job-output.txt

Revision history for this message
Marios Andreou (marios-b) wrote :

I can't see any major difference in the nodes between a good log [1] and a timeout out [2], except the timeout one has less free memory (but same total)

[1] * MemTotal: 8150828 kB
         MemFree: 1240728 kB

[2] * MemTotal: 8150828 kB
         MemFree: 293704 kB

Similarly the cpuinfo log looks the same good @ [3] bad at [4]

I see in the errors log an issue reaching rabbit on controller-1 with retries, I don't know if that is directly related

2021-08-05 18:18:42.101 ERROR /var/log/containers/nova/nova-api.log: 17 ERROR oslo.messaging._drivers.impl_rabbit [-] [116db7bd-d0dd-4522-b666-8282a6acc71f] AMQP server on overcloud-controller-1.internalapi.localdomain:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection

[1] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/d385cc6/logs/undercloud/var/log/extra/meminfo.txt.gz
[2] https://logserver.rdoproject.org/61/34861/1/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/894cb7e/logs/undercloud/var/log/extra/meminfo.txt.gz
[3] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/d385cc6/logs/undercloud/var/log/extra/cpuinfo.txt.gz
[4] https://logserver.rdoproject.org/61/34861/1/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/894cb7e/logs/undercloud/var/log/extra/cpuinfo.txt.gz
[5] https://logserver.rdoproject.org/61/34861/1/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/894cb7e/logs/overcloud-controller-0/var/log/extra/errors.txt.gz

Revision history for this message
Marios Andreou (marios-b) wrote :

it also seems to be inconsistent :/

among the TIMED_OUT we also have a couple of success from Saturday 7th

        * https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby
        * 3 hrs 49 mins 3 secs 2021-08-07 22:11:54 SUCCESS
        * 3 hrs 48 mins 3 secs 2021-08-07 16:36:46 SUCCESS

but they are taking close to 4 hours so pretty close to the timeout which is 4 hours (inherited from https://github.com/rdo-infra/review.rdoproject.org-config/blob/fe750ba09c4054d2cce845a0de3b38dba8bcd02b/zuul.d/tripleo-rdo-base.yaml#L25 )

so why is it taking almost 2 hours to run tempest it seems excessive

used to take closer to 1 hour.

Revision history for this message
Marios Andreou (marios-b) wrote :

Based on comment #1 and attached screen shot this started ~3rd August. I compared 2 'good runs' one that took close to 3 hours from 2/3 August [1] and another recent one from yesterday 9th august [2]

From [1]

 3 hrs 9 mins 1 sec 2021-08-02 22:22:27

Ran: 1416 tests in 3475.6904 sec.
 - Passed: 1295
 - Skipped: 121
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 8074.3680 sec.

 - Worker 0 (428 tests) => 0:57:48.876694
 - Worker 1 (334 tests) => 0:50:33.366087
 - Worker 2 (367 tests) => 0:39:59.810439
 - Worker 3 (287 tests) => 0:49:15.021685

From [2]
Ran: 1416 tests in 7587.3411 sec.
 - Passed: 1295
 - Skipped: 121
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 16297.8125 sec.
 - Worker 0 (362 tests) => 1:59:08.312166
 - Worker 1 (356 tests) => 1:08:44.041797
 - Worker 2 (426 tests) => 2:06:16.196664
 - Worker 3 (272 tests) => 1:21:51.781254

As can be seen in 2 the same tests take twice as long to complete. You can see more about the timings at the stackviz logs [3] ('good' ~1 hour tempest run) and [4] (bad ~2 hours tempest)

[1] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/d385cc6/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz

[2] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/fb267a1/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz

[3] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/d385cc6/logs/stackviz/#/

[4] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/fb267a1/logs/stackviz/#/

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :
Revision history for this message
Ronelle Landy (rlandy) wrote :

From https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby it looks like the timeouts started on August 3rd.

Also noticed a difference in the openvswitch versions from August 3rd:

network-scripts-openvswitch2.15.x86_64 2.15.0-30.el8s @centos-nfv-ovs
openvswitch-selinux-extra-policy.noarch 1.0-28.el8 @centos-nfv-ovs
openvswitch2.15.x86_64 2.15.0-30.el8s @centos-nfv-ovs

-------
network-scripts-openvswitch2.15.x86_64 2.15.0-32.el8s @centos-nfv-ovs
openvswitch-selinux-extra-policy.noarch 1.0-28.el8 @centos-nfv-ovs
openvswitch2.15.x86_64 2.15.0-32.el8s @centos-nfv-ovs

https://cbs.centos.org/koji/buildinfo?buildID=33594

matching that new build.

Maybe we downgrade openvswitch and see if we do better?

Revision history for this message
chandan kumar (chkumar246) wrote :

Since openvswitch2.15-2.15.0-32.el8s.x86_64 is also used in periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-master job and earlier it was passing frequently https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-master

SO ovs update is might not be the culprit.

Revision history for this message
Martin Kopec (mkopec) wrote :
Download full text (15.0 KiB)

Many tests just take longer, f.e:

test_dhcp6_stateless_from_os
116.8 seconds -> 206.8 seconds
test_dualnet_dhcp6_stateless_from_os
134.2 seconds -> 245.8 seconds
test_dualnet_multi_prefix_dhcpv6_stateless
161.2 seconds -> 254.8 seconds

Seems like all requests, especially GET ones are taking much longer, comparison of requests within test_dualnet_multi_prefix_dhcpv6_stateless test:

$ cut -d" " -f12- good_r
200 POST https://[2001:db8:fd00:1000::5]:13774/v2.1/os-keypairs 0.341s
201 POST https://[2001:db8:fd00:1000::5]:13696/v2.0/security-groups 0.467s
201 POST https://[2001:db8:fd00:1000::5]:13696/v2.0/security-group-rules 0.345s
201 POST https://[2001:db8:fd00:1000::5]:13696/v2.0/security-group-rules 0.208s
201 POST https://[2001:db8:fd00:1000::5]:13696/v2.0/security-group-rules 0.695s
201 POST https://[2001:db8:fd00:1000::5]:13696/v2.0/security-group-rules 0.290s
201 POST https://[2001:db8:fd00:1000::5]:13696/v2.0/security-group-rules 0.252s
201 POST https://[2001:db8:fd00:1000::5]:13696/v2.0/security-group-rules 0.316s
201 POST https://[2001:db8:fd00:1000::5]:13696/v2.0/networks 0.858s
201 POST https://[2001:db8:fd00:1000::5]:13696/v2.0/networks 1.654s
200 GET https://[2001:db8:fd00:1000::5]:13696/v2.0/subnets?project_id=b0989b4d43864d56ac1f4b312684d692&cidr=10.100.0.0%2F28 0.108s
200 GET https://[2001:db8:fd00:1000::5]:13696/v2.0/networks?router%3Aexternal=True 0.187s
200 GET https://[2001:db8:fd00:1000::5]:13696/v2.0/subnets?network_id=883143f0-32da-4ffa-9f01-08ca456b13e2&cidr=10.100.0.0%2F28 0.081s
201 POST https://[2001:db8:fd00:1000::5]:13696/v2.0/subnets 1.373s
201 POST https://[2001:db8:fd00:1000::5]:13696/v2.0/routers 3.186s
200 GET https://[2001:db8:fd00:1000::5]:13696/v2.0/subnets?project_id=b0989b4d43864d56ac1f4b312684d692&cidr=2001%3Adb8%3A%3A%2F64 0.118s
200 GET https://[2001:db8:fd00:1000::5]:13696/v2.0/networks?router%3Aexternal=True 0.245s
200 GET https://[2001:db8:fd00:1000::5]:13696/v2.0/subnets?network_id=883143f0-32da-4ffa-9f01-08ca456b13e2&cidr=2001%3Adb8%3A%3A%2F64 0.052s
201 POST https://[2001:db8:fd00:1000::5]:13696/v2.0/subnets 1.023s
200 GET https://[2001:db8:fd00:1000::5]:13696/v2.0/subnets?project_id=b0989b4d43864d56ac1f4b312684d692&cidr=2001%3Adb8%3A%3A%2F64 0.079s
200 GET https://[2001:db8:fd00:1000::5]:13696/v2.0/networks?router%3Aexternal=True 0.126s
200 GET https://[2001:db8:fd00:1000::5]:13696/v2.0/subnets?network_id=883143f0-32da-4ffa-9f01-08ca456b13e2&cidr=2001%3Adb8%3A%3A%2F64 0.062s
200 GET https://[2001:db8:fd00:1000::5]:13696/v2.0/subnets?project_id=b0989b4d43864d56ac1f4b312684d692&cidr=2001%3Adb8%3A0%3A1%3A%3A%2F64 0.033s
200 GET https://[2001:db8:fd00:1000::5]:13696/v2.0/networks?router%3Aexternal=True 0.170s
200 GET https://[2001:db8:fd00:1000::5]:13696/v2.0/subnets?network_id=883143f0-32da-4ffa-9f01-08ca456b13e2&cidr=2001%3Adb8%3A0%3A1%3A%3A%2F64 0.050s
201 POST https://[2001:db8:fd00:1000::5]:13696/v2.0/subnets 1.431s
201 POST https://[2001:db8:fd00:1000::5]:13000/v3/auth/tokens 0.520s
202 POST https://[2001:db8:fd00:1000::5]:13774/v2.1/servers 2.111s
200 GET https://[2001:db8:fd00:1000::5]:13774/v2.1/servers/732d09bb-f2d3-4be4-ad46-428a8d8d5250 0.203s
200 GET https://[2001:db8:fd00:1000::5]:1...

Revision history for this message
yatin (yatinkarel) wrote :

So it's not just wallaby, xena is also impacted. Since https://review.opendev.org/q/I9d0ddcc88098a5b891829192f1ce656842d0aa15 api operations are taking longer. It's just impacted fs035 ipv6 job, ipv4 job fs001 is working fine.

For example:-
PASSING JOB:-
$ grep -nr "GET /v2.1/os-floating-ips" pass.txt |cut -d" " -f22
0.273092
0.259492
0.238887

$ grep -nr "DELETE /v2.1/os-security-groups" pass.txt |cut -d" " -f22
0.368750
0.063184
0.254873
0.391544
0.331399
0.244026
0.425756
0.453150
0.236778
0.270068
0.586434
0.037038
0.045581
0.648366
0.236613
0.221590

FAILING JOB:-
$ grep -nr "GET /v2.1/os-floating-ips" fail.txt |cut -d" " -f22
3.625276
2.823670
4.931641
2.061295

$ grep -nr "DELETE /v2.1/os-security-groups" fail.txt |cut -d" " -f22
3.348766
2.595817
2.260266
2.017600
1.669031
3.165000
3.684208
2.686081
1.148234
2.810574
1.752639
2.078275
2.003472

Revision history for this message
Ronelle Landy (rlandy) wrote :

https://review.opendev.org/q/I9d0ddcc88098a5b891829192f1ce656842d0aa15 - with that reverted, fs035 jobs on wallaby are running back at regular times

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart (master)
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Grzegorz Grasza (xek) wrote :

I tested a different way of disabling FQDNs in memcache server list configuration here:

https://review.rdoproject.org/r/c/testproject/+/35061

The first successful run is without any change and the second one switches to IPs without reverting the large patch.

The second run finished faster by 22 minutes.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I looked at logs from the job https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/fb267a1/logs/ and one thing is strange for me.
In nsswitch.conf file there is:

hosts: files dns myhostname

https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/fb267a1/logs/overcloud-controller-0/etc/nsswitch.conf.txt.gz

So resolve of the names should be first done using /etc/hosts file and in this file there are entries for controllers like overcloud-controller-1.internalapi.localdomain and others. So it should be resolved fast using that file. Maybe systemd-resolved is doing it differently but I don't really know that :/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/805952
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/1ce490716d3ff0a1bb2f958a0d2d96bc2257ea7d
Submitter: "Zuul (22348)"
Branch: master

commit 1ce490716d3ff0a1bb2f958a0d2d96bc2257ea7d
Author: Grzegorz Grasza <email address hidden>
Date: Wed Aug 25 09:20:06 2021 +0200

    Environment for switching to using IPs for memcached

    Related-Bug: #1939023
    Change-Id: Iaadee6be4e1eaf4170ee548b6fdb2c92b69dbc56

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/807051

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/807052

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/807053

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/807054

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/807051
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/2456e593011903024a747cb7182e3c9a3a0b6424
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 2456e593011903024a747cb7182e3c9a3a0b6424
Author: Grzegorz Grasza <email address hidden>
Date: Wed Aug 25 09:20:06 2021 +0200

    Environment for switching to using IPs for memcached

    Related-Bug: #1939023
    Change-Id: Iaadee6be4e1eaf4170ee548b6fdb2c92b69dbc56
    (cherry picked from commit 1ce490716d3ff0a1bb2f958a0d2d96bc2257ea7d)

tags: added: in-stable-wallaby
tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/807052
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/4cbc970d15066aa2fa4ba695ac19b77a588c94b5
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 4cbc970d15066aa2fa4ba695ac19b77a588c94b5
Author: Grzegorz Grasza <email address hidden>
Date: Wed Aug 25 09:20:06 2021 +0200

    Environment for switching to using IPs for memcached

    Related-Bug: #1939023
    Change-Id: Iaadee6be4e1eaf4170ee548b6fdb2c92b69dbc56
    (cherry picked from commit 1ce490716d3ff0a1bb2f958a0d2d96bc2257ea7d)

Revision history for this message
Slawek Kaplonski (slaweq) wrote :
Download full text (5.9 KiB)

Today I checked logs from the job https://review.rdoproject.org/zuul/build/c3c8003f08454b56b53e626b77946541/logs (which timed out).
I found out that there are some tests which runs very long time, like e.g. tempest.api.compute.floating_ips.test_floating_ips_actions.FloatingIPsAssociationTestJSON.test_associate_already_associated_floating_ip which took 101 seconds in that case (see https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/c3c8003/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz)
I compared this with u/s job and the same test took about 18 seconds.

Now, I checked in tempest logs, what took so long in that test and here is what I found:

zgrep test_associate_already_associated_floating_ip tempest.log.txt.gz 1 ↵
2021-09-07 12:11:47.017 321761 INFO tempest.lib.common.rest_client [req-fef2015f-da1e-4cd5-96a0-ae8f866b011a ] Request (FloatingIPsAssociationTestJSON:test_associate_already_associated_floating_ip): 201 POST https://[2001:db8:fd00:1000::5]:13000/v3/auth/tokens 0.618s
2021-09-07 12:11:47.019 321761 INFO tempest.lib.common.fixed_network [-] (FloatingIPsAssociationTestJSON:test_associate_already_associated_floating_ip) Found network {'id': 'f13ea60f-7c63-4e86-a2b5-8763b9d961ac', 'name': 'tempest-FloatingIPsAssociationTestJSON-807472534-network', 'tenant_id': 'ab0311a8b40c4819b4eaef28810349b4', 'admin_state_up': True, 'mtu': 1292, 'status': 'ACTIVE', 'subnets': [], 'shared': False, 'project_id': 'ab0311a8b40c4819b4eaef28810349b4', 'qos_policy_id': None, 'port_security_enabled': True, 'router:external': False, 'provider:network_type': 'geneve', 'provider:physical_network': None, 'provider:segmentation_id': 1566, 'availability_zone_hints': [], 'is_default': False, 'availability_zones': [], 'ipv4_address_scope': None, 'ipv6_address_scope': None, 'description': '', 'l2_adjacency': True, 'tags': [], 'created_at': '2021-09-07T12:10:46Z', 'updated_at': '2021-09-07T12:10:46Z', 'revision_number': 1} available for tenant
2021-09-07 12:11:53.102 321761 INFO tempest.lib.common.rest_client [req-160b2232-94b4-4e9a-a32c-0cad25e19623 ] Request (FloatingIPsAssociationTestJSON:test_associate_already_associated_floating_ip): 202 POST https://[2001:db8:fd00:1000::5]:13774/v2.1/servers 6.081s
2021-09-07 12:12:01.016 321761 INFO tempest.lib.common.rest_client [req-a2cc5468-d003-4fdb-bb7a-9afc9b44cb5e ] Request (FloatingIPsAssociationTestJSON:test_associate_already_associated_floating_ip): 200 GET https://[2001:db8:fd00:1000::5]:13774/v2.1/servers/ea159240-bdfa-4a34-ab2a-ff8f67776601 7.911s
2021-09-07 12:12:05.347 321761 INFO tempest.lib.common.rest_client [req-bc47090b-7794-4985-b365-9df25e112af7 ] Request (FloatingIPsAssociationTestJSON:test_associate_already_associated_floating_ip): 200 GET https://[2001:db8:fd00:1000::5]:13774/v2.1/servers/ea159240-bdfa-4a34-ab2a-ff8f67776601 3.314s
2021-09-07 12:12:09.281 321761 INFO tempest.lib...

Read more...

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

@Slawek, did you testing show different results to what was brought in https://bugs.launchpad.net/tripleo/+bug/1939023/comments/12 ?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I can't see the extr env file to switch memcached to use IPs there https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/c3c8003/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

could you please adjust the job and retry it with environments/memcached-use-ips.yaml in place?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/807054
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/3d637e176178e9464cf5221129c21622083e2085
Submitter: "Zuul (22348)"
Branch: stable/train

commit 3d637e176178e9464cf5221129c21622083e2085
Author: Grzegorz Grasza <email address hidden>
Date: Wed Aug 25 09:20:06 2021 +0200

    Environment for switching to using IPs for memcached

    Related-Bug: #1939023
    Change-Id: Iaadee6be4e1eaf4170ee548b6fdb2c92b69dbc56
    (cherry picked from commit 1ce490716d3ff0a1bb2f958a0d2d96bc2257ea7d)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/807053
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/750877f25e27b0f7f023dbe2be6a7cb961168034
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 750877f25e27b0f7f023dbe2be6a7cb961168034
Author: Grzegorz Grasza <email address hidden>
Date: Wed Aug 25 09:20:06 2021 +0200

    Environment for switching to using IPs for memcached

    Related-Bug: #1939023
    Change-Id: Iaadee6be4e1eaf4170ee548b6fdb2c92b69dbc56
    (cherry picked from commit 1ce490716d3ff0a1bb2f958a0d2d96bc2257ea7d)

tags: added: in-stable-ussuri
Revision history for this message
yatin (yatinkarel) wrote :

<< could you please adjust the job and retry it with environments/memcached-use-ips.yaml in place?

https://review.opendev.org/c/openstack/tripleo-quickstart/+/805953 will update the jobs, all depends-on are merged now.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-quickstart/+/805953
Committed: https://opendev.org/openstack/tripleo-quickstart/commit/4b2454350289fe2b0c5ccd6cbd6d3fc48997d643
Submitter: "Zuul (22348)"
Branch: master

commit 4b2454350289fe2b0c5ccd6cbd6d3fc48997d643
Author: Grzegorz Grasza <email address hidden>
Date: Wed Aug 25 09:23:52 2021 +0200

    Use IPs instead of FQDNs in memcached with IPv6

    Change-Id: I34c6a4d9e64e13cfca11c239b474760e22a820ba
    Resolves-Bug: #1939023
    Depends-On: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/805952

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
Attila Fazekas (afazekas) wrote :

Probably you want to switch to a different memcached library:
https://github.com/linsomniac/python-memcached/issues/177

The current one is not prepared for if a name resolves to an ipv6 address,
but works with ipv6: addresses when configured by ip.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :
Download full text (9.4 KiB)

I was trying to reproduce that issue today but wasn't able to reproduce and investigate that issue. When I run it on test patch, tempest ended up for me in about 4300 seconds

======
Totals
======
Ran: 1425 tests in 4297.8985 sec.
 - Passed: 1303
 - Skipped: 121
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 1
Sum of execute time for each test: 8342.5592 sec.

I also checked builds history https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby and I see that time of that job varies from about 3h 14min (https://review.rdoproject.org/zuul/build/0f8c934c63864203a669348b086a2882) to more than 4 hours (and sometimes timeout, like e.g. https://review.rdoproject.org/zuul/build/9d2881833e4d4ff3903afa4a3501ecc3/logs)

Next I compared time of the test execution in the fast (https://review.rdoproject.org/zuul/build...

Read more...

Revision history for this message
Lee Yarwood (lyarwood) wrote :

As discussed downstream this appears to be the result of the environments/low-memory-usage.yaml environment being used in this job. This limits several API services to a single worker causing any synchronous request to block the entire service.

Ultimately we either need to remove this environment *or* if that's not possible, increase timeouts for the individual test and overall test run.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

As a related to this issue, we should switch the environments/low-memory-usage.yaml env for dynamic workers instead of static. I've a WIP patch. So even going 1 -> 2 servers workers for our usual resources constrained VM controllers in CI jobs might drastically speed up overall tempest execution times (like pretty everywhere!)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/puppet-tripleo/+/809982

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/809986

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (master)

Change abandoned by "Bogdan Dobrelya <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/puppet-tripleo/+/809982

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by "Bogdan Dobrelya <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/809986

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart/+/810264

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (stable/wallaby)

Change abandoned by "Takashi Kajinami <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/puppet-tripleo/+/804344

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (master)

Change abandoned by "chandan kumar <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/puppet-tripleo/+/804343

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ci (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/810672

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ci (master)

Change abandoned by "Bogdan Dobrelya <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/810672
Reason: I don't think this is needed, let's tweak on a fs/job basisc

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by "Bogdan Dobrelya <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/810256

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ci (master)

Change abandoned by "Bogdan Dobrelya <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/810672

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart (master)

Change abandoned by "Bogdan Dobrelya <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart/+/810264

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.