Master Sc010-kvm job is failing on octavia related tempest test: octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest

Bug #1973223 reported by Sandeep Yadav
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

Description:

Master Sc010-kvm job is failing on octavia related tempest test: octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest

tempest run:-

~~~
{1} setUpClass (octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest) [0.000000s] ... FAILED

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):

      File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)

      File "<string>", line 3, in raise_from

      File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()

      File "/usr/lib64/python3.9/http/client.py", line 1377, in getresponse
    response.begin()

      File "/usr/lib64/python3.9/http/client.py", line 320, in begin
    version, status, reason = self._read_status()

      File "/usr/lib64/python3.9/http/client.py", line 281, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")

      File "/usr/lib64/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)

    socket.timeout: timed out

During handling of the above exception, another exception occurred:

    Traceback (most recent call last):

      File "/usr/lib/python3.9/site-packages/tempest/test.py", line 168, in setUpClass
    raise value.with_traceback(trace)

.
.

      File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
    raise ReadTimeoutError(

    urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='192.168.24.3', port=8774): Read timed out. (read timeout=60)
~~~

nova errors:-

~~~

2022-05-12 14:17:19.252 ERROR /var/log/containers/nova/nova-api.log: 6 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.9/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 1010, in wrapper
2022-05-12 14:17:19.252 ERROR /var/log/containers/nova/nova-api.log: 6 ERROR nova.api.openstack.wsgi return fn(*args, **kwargs)
2022-05-12 14:17:19.252 ERROR /var/log/containers/nova/nova-api.log: 6 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.9/site-packages/nova/objects/keypair.py", line 40, in _get_from_db
2022-05-12 14:17:19.252 ERROR /var/log/containers/nova/nova-api.log: 6 ERROR nova.api.openstack.wsgi raise exception.KeypairNotFound(user_id=user_id, name=name)
2022-05-12 14:17:19.252 ERROR /var/log/containers/nova/nova-api.log: 6 ERROR nova.api.openstack.wsgi nova.exception.KeypairNotFound: Keypair octavia-ssh-key not found for user 7527fbfe4d204aafaff987d944a25019
.
.
05-12 14:18:28.320 ERROR /var/log/containers/nova/nova-api.log: 6 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.9/site-packages/nova/objects/flavor.py", line 317, in _flavor_get_by_flavor_id_from_db
2022-05-12 14:18:28.320 ERROR /var/log/containers/nova/nova-api.log: 6 ERROR nova.api.openstack.wsgi raise exception.FlavorNotFound(flavor_id=flavor_id)
2022-05-12 14:18:28.320 ERROR /var/log/containers/nova/nova-api.log: 6 ERROR nova.api.openstack.wsgi nova.exception.FlavorNotFound: Flavor octavia_65 could not be found.
.
.
24:47.052 ERROR /var/log/containers/nova/nova-api.log: 6 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.9/site-packages/neutronclient/v2_0/client.py", line 469, in _find_resource_by_name
2022-05-12 14:24:47.052 ERROR /var/log/containers/nova/nova-api.log: 6 ERROR nova.api.openstack.wsgi raise exceptions.NotFound(message=not_found_message)
2022-05-12 14:24:47.052 ERROR /var/log/containers/nova/nova-api.log: 6 ERROR nova.api.openstack.wsgi neutronclient.common.exceptions.NotFound: Unable to find security_group with name 'tempest-lb_member_SG-867648610'
~~~

Revision history for this message
Michael Johnson (johnsom) wrote :

Nova API is not responding per the HAProxy logs:

May 12 14:24:23 standalone haproxy[7]: 192.168.24.3:51460 [12/May/2022:14:22:53.266] nova_osapi nova_osapi_be/standalone.ctlplane.localdomain 0/0/0/90093/90093 504 397 - - ---- 41/3/2/2/0 0/0 "POST /v2.1/servers HTTP/1.1"

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

openstack rpm diff between affected job and last good known job:-

https://paste.openstack.org/show/beA9TQcrWxSqwX9F6Cz0/

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

The error occurs before Octavia is involved.

octavia-tempest-plugin creates a nova VM for the test.

we can see the request in the nova logs at 14:22:53 but then the next log message is at 14:24:47.

octavia-tempest-plugin uses a 60 sec timeout when talking to the nova API, so octavia-tempest-plugin hit this timeout and started cleaning its resources (deleting the SG).

2022-05-12 14:22:53.270 6 DEBUG nova.api.openstack.wsgi [req-da077566-534e-44e0-8e2b-8c01c3aaf874 749ea9b67c9141019e62de15d515cbab f3f1b301f73d4f289ad66c7c39d80ef0 - default default] Action: 'create', calling method: <bound method ServersController.create of <nova.api.openstack.compute.servers.ServersController object at 0x7f5277b73a00>>, body: {"server": {"name": "tempest-lb_member_webserver1-998709152", "flavorRef": "aeb56e14-04cc-4013-891e-d7efda9526f4", "imageRef": "ea47db00-58f9-453c-b222-c8596077e7ca", "key_name": "tempest-lb_member_keypair-527438237", "security_groups": [{"name": "tempest-lb_member_SG-867648610"}], "networks": [{"uuid": "dd2f4961-ebfd-4793-a4a7-d20694855836"}]}} _process_stack /usr/lib/python3.9/site-packages/nova/api/openstack/wsgi.py:511
2022-05-12 14:24:47.048 6 DEBUG nova.network.security_group_api [req-da077566-534e-44e0-8e2b-8c01c3aaf874 749ea9b67c9141019e62de15d515cbab f3f1b301f73d4f289ad66c7c39d80ef0 - default default] Neutron security group tempest-lb_member_SG-867648610 not found validate_name /usr/lib/python3.9/site-packages/nova/network/security_group_api.py:74
2022-05-12 14:24:47.050 6 INFO nova.api.openstack.wsgi [req-da077566-534e-44e0-8e2b-8c01c3aaf874 749ea9b67c9141019e62de15d515cbab f3f1b301f73d4f289ad66c7c39d80ef0 - default default] HTTP exception thrown: Unable to find security_group with name or id 'tempest-lb_member_SG-867648610'
2022-05-12 14:24:47.052 6 DEBUG nova.api.openstack.wsgi [req-da077566-534e-44e0-8e2b-8c01c3aaf874 749ea9b67c9141019e62de15d515cbab f3f1b301f73d4f289ad66c7c39d80ef0 - default default] Request method failure captured:
  request: POST /v2.1/servers HTTP/1.1

it's not clear to me what has happened in nova during those 90 secs, maybe nova folks will have more information on it

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

The latest run of job passed twice with the latest hash, We are keeping this issue under observation.

Revision history for this message
Ronelle Landy (rlandy) wrote :

History on this is green now - closing out

Revision history for this message
Ronelle Landy (rlandy) wrote :

This error is showing up again - reopening to track

Revision history for this message
Ronelle Landy (rlandy) wrote :
Revision history for this message
Ananya Banerjee (frenzyfriday) wrote :
Download full text (4.6 KiB)

[https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-scenario010-kvm-standalone-master/b75a959/logs/undercloud/var/log/containers/haproxy/haproxy.log.txt.gz](https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-scenario010-kvm-standalone-master/b75a959/logs/undercloud/var/log/containers/haproxy/haproxy.log.txt.gz)

The call `[23/May/2022:09:36:12.023] nova_osapi nova_osapi_be/standalone.ctlplane.localdomain 0/0/0/90087/90087 504 397 - - ---- 44/4/3/3/0 0/0 "POST /v2.1/servers HTTP/1.1"` is taking more than 90 sec, but octavia-tempest-plugin has a timeout at 60sec.

Tracing `"POST /v2.1/servers HTTP/1.1`" in nova logs:

[https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-scenario010-kvm-standalone-master/b75a959/logs/undercloud/var/log/containers/nova/nova-api.log.txt.gz](https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-scenario010-kvm-standalone-master/b75a959/logs/undercloud/var/log/containers/nova/nova-api.log.txt.gz)

```
2022-05-23 09:36:12.028 17 DEBUG nova.api.openstack.wsgi [req-a4d0d2bf-d5e6-4413-9018-f7406abe1b71 302e4b1c790445a2a275a031cc24c19f 5aa4f69b981f4d49bb3e3c4b6ba9f6ed - default default] Action: 'create', calling method: <bound method ServersController.create of <nova.api.openstack.compute.servers.ServersController object at 0x7f4c7afd1af0>>, body: {"server": {"name": "tempest-lb_member_webserver1-614089908", "flavorRef": "30eb6176-db73-4d9a-820b-9ba0cf296f69", "imageRef": "766ae365-5a57-4f91-8cc4-e63c47363e75", "key_name": "tempest-lb_member_keypair-1663605793", "security_groups": [{"name": "tempest-lb_member_SG-1855534900"}], "networks": [{"uuid": "dda891c0-e7f8-4a5e-a6f6-78e6aa861701"}]}} _process_stack /usr/lib/python3.9/site-packages/nova/api/openstack/wsgi.py:511
2022-05-23 09:38:08.892 17 DEBUG nova.network.security_group_api [req-a4d0d2bf-d5e6-4413-9018-f7406abe1b71 302e4b1c790445a2a275a031cc24c19f 5aa4f69b981f4d49bb3e3c4b6ba9f6ed - default default] Neutron security group tempest-lb_member_SG-1855534900 not found validate_name /usr/lib/python3.9/site-packages/nova/network/security_group_api.py:74
2022-05-23 09:38:08.894 17 INFO nova.api.openstack.wsgi [req-a4d0d2bf-d5e6-4413-9018-f7406abe1b71 302e4b1c790445a2a275a031cc24c19f 5aa4f69b981f4d49bb3e3c4b6ba9f6ed - default default] HTTP exception thrown: Unable to find security_group with name or id 'tempest-lb_member_SG-1855534900'
2022-05-23 09:38:08.895 17 DEBUG nova.api.openstack.wsgi [req-a4d0d2bf-d5e6-4413-9018-f7406abe1b71 302e4b1c790445a2a275a031cc24c19f 5aa4f69b981f4d49bb3e3c4b6ba9f6ed - default default] Request method failure captured:
  request: POST /v2.1/servers HTTP/1.1
```

Tracing why the security_group is not found in neutron logs:

[https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-scenario010-kvm-standalone-master/b75a959/logs...

Read more...

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

from the same job logs [0] on PSI I can see neutron server doesn't use memcache:

2022-05-23 01:40:46.163 2 DEBUG oslo_service.service [-] cache.backend = dogpile.cache.null log_opt_values /usr/lib/python3.9/site-packages/oslo_config/cfg.py:2609

Am not a neutron caching backend expert, but something to figure out with neutron team

[0] https://sf.hosted.upshift.rdu2.redhat.com/logs/63/43063/1/check-rdo/periodic-tripleo-ci-centos-9-scenario010-kvm-internal-standalone-master/5cea7db/logs/undercloud/var/log/containers/neutron/server.log

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

Hello Bogdan,

We tried reach out to neutron team as per bcafarel - dogpile.cache looks default (and he sees similar value in devstack) but he was not sure completely if for OSP we have to set this to memcache.

bcafarel tagged few other neutron folks to confirm this, and currently we are waiting for confirmation.

Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

no we cannot skip those tests posted revert @ https://review.opendev.org/c/openstack/openstack-tempest-skiplist/+/843827

the suggestion is to try a bigger node and see if it helps

Otherwise it is more viable to remove this job from criteria instead of skipping those tests

Revision history for this message
chandan kumar (chkumar246) wrote :

A new flavor `single-centos-9-big-node` is added in the Downstream nodeset. we are currently testing the master jobs with this nodeset.

Revision history for this message
chandan kumar (chkumar246) wrote :

The patch is merged now, we can move it to Done.

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.