Tripleo-CI Master and Wallaby jobs failing with keystone lost mysql connection

Bug #1965525 reported by Arx Cruz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

There are some jobs failing on tempest test, with two different errors however related, the first one, all tempest tests that relies on keystone endpoint are failing with the following error:

https://logserver.rdoproject.org/46/38646/16/check/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-master/b9ff5d0/logs/undercloud/var/log/tempest/stestr_results.html.gz

ft1.1: setUpClass (cinder_tempest_plugin.scenario.test_snapshots.SnapshotDataIntegrityTests)testtools.testresult.real._StringException: Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/tempest/test.py", line 168, in setUpClass
    raise value.with_traceback(trace)
  File "/usr/lib/python3.9/site-packages/tempest/test.py", line 153, in setUpClass
    cls.setup_credentials()
  File "/usr/lib/python3.9/site-packages/tempest/test.py", line 372, in setup_credentials
    manager = cls.get_client_manager(
  File "/usr/lib/python3.9/site-packages/tempest/test.py", line 701, in get_client_manager
    cred_provider = cls._get_credentials_provider()
  File "/usr/lib/python3.9/site-packages/tempest/test.py", line 676, in _get_credentials_provider
    cls._creds_provider = credentials.get_credentials_provider(
  File "/usr/lib/python3.9/site-packages/tempest/common/credentials_factory.py", line 141, in get_credentials_provider
    **get_dynamic_provider_params(identity_version))
  File "/usr/lib/python3.9/site-packages/tempest/common/credentials_factory.py", line 72, in get_dynamic_provider_params
    admin_creds = admin_creds or get_configured_admin_credentials(
  File "/usr/lib/python3.9/site-packages/tempest/common/credentials_factory.py", line 257, in get_configured_admin_credentials
    credentials = get_credentials(fill_in=fill_in,
  File "/usr/lib/python3.9/site-packages/tempest/common/credentials_factory.py", line 300, in get_credentials
    return auth.get_credentials(auth_url,
  File "/usr/lib/python3.9/site-packages/tempest/lib/auth.py", line 653, in get_credentials
    creds = auth_provider.fill_credentials()
  File "/usr/lib/python3.9/site-packages/tempest/lib/auth.py", line 122, in fill_credentials
    auth_data = self.get_auth()
  File "/usr/lib/python3.9/site-packages/tempest/lib/auth.py", line 148, in get_auth
    self.set_auth()
  File "/usr/lib/python3.9/site-packages/tempest/lib/auth.py", line 157, in set_auth
    self.cache = self._get_auth()
  File "/usr/lib/python3.9/site-packages/tempest/lib/auth.py", line 312, in _get_auth
    token, auth_data = auth_func(**auth_params)
  File "/usr/lib/python3.9/site-packages/tempest/lib/services/identity/v3/token_client.py", line 190, in get_token
    body = self.auth(**kwargs)
  File "/usr/lib/python3.9/site-packages/tempest/lib/services/identity/v3/token_client.py", line 143, in auth
    resp, body = self.post(self.auth_url, body=body)
  File "/usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py", line 299, in post
    return self.request('POST', url, extra_headers, headers, body, chunked)
  File "/usr/lib/python3.9/site-packages/tempest/lib/services/identity/v3/token_client.py", line 172, in request
    raise exceptions.IdentityError(
tempest.lib.exceptions.IdentityError: Got identity error
Details: Unexpected status code 500

Revision history for this message
Arx Cruz (arxcruz) wrote :
Download full text (25.0 KiB)

Checking the keystone logs I found this:

https://logserver.rdoproject.org/46/38646/16/check/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-master/b9ff5d0/logs/overcloud-controller-0/var/log/containers/keystone/keystone.log.txt.gz

2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context Traceback (most recent call last):
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context File "/usr/lib/python3.9/site-packages/keystone/server/flask/request_processing/middleware/auth_context.py", line 103, in _inner
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context return method(self, request)
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context File "/usr/lib/python3.9/site-packages/keystone/server/flask/request_processing/middleware/auth_context.py", line 353, in process_request
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context resp = super(AuthContextMiddleware, self).process_request(request)
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context File "/usr/lib/python3.9/site-packages/keystonemiddleware/auth_token/__init__.py", line 409, in process_request
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context data, user_auth_ref = self._do_fetch_token(
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context File "/usr/lib/python3.9/site-packages/keystonemiddleware/auth_token/__init__.py", line 445, in _do_fetch_token
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context data = self.fetch_token(token, **kwargs)
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context File "/usr/lib/python3.9/site-packages/keystone/server/flask/request_processing/middleware/auth_context.py", line 247, in fetch_token
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context self.token = self.token_provider_api.validate_token(
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context File "/usr/lib/python3.9/site-packages/keystone/common/manager.py", line 115, in wrapped
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context __ret_val = __f(*args, **kwargs)
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context File "/usr/lib/python3.9/site-packages/keystone/token/provider.py", line 145, in validate_token
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context token = self._validate_token(token_id)
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context File "<decorator-gen-26>", line 2, in _validate_token
2022-03-17 23:24:08.820 110 ERROR keystone.server.flask.request_processing.middleware.auth_context File "/usr/lib/python3...

description: updated
Revision history for this message
Arx Cruz (arxcruz) wrote :
Download full text (4.6 KiB)

This SQL error match with other tempest errors related to db connection:

ft36.8: tempest.scenario.test_network_v6.TestGettingAddress.test_slaac_from_os[compute,id-2c92df61-29f0-4eaa-bee3-7c65bef62a43,network,slow]testtools.testresult.real._StringException: pythonlogging:'': {{{
2022-03-17 19:24:08,805 201279 INFO [tempest.lib.common.rest_client] Request (TestGettingAddress:setUp): 500 POST http://10.0.0.5:8774/v2.1/os-keypairs 0.015s
2022-03-17 19:24:08,805 201279 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-OpenStack-Nova-API-Version': '2.1', 'X-Auth-Token': '<omitted>'}
        Body: {"keypair": {"name": "tempest-TestGettingAddress-859199174", "public_key": "ecdsa-sha2-nistp384 AAAAE2VjZHNhLXNoYTItbmlzdHAzODQAAAAIbmlzdHAzODQAAABhBNyTQ4wV6HTMsdwTQzAUJ7DecjJ4+6JpCA3s0q7Cw+IJD7hImMHsmj1iOIa+OheCiiAq/dtRnWhFSTKDvX8zAyrpcUv3QtAzZ0TlBEtOKOh6hWJZ7Ni7+0PrCd+q1TFiiQ=="}}
    Response - Headers: {'date': 'Thu, 17 Mar 2022 23:24:08 GMT', 'server': 'Apache', 'content-length': '209', 'openstack-api-version': 'compute 2.1', 'x-openstack-nova-api-version': '2.1', 'vary': 'OpenStack-API-Version,X-OpenStack-Nova-API-Version', 'x-openstack-request-id': 'req-e85b9e92-e0a6-42cf-a49d-178e2d8b91c5', 'x-compute-request-id': 'req-e85b9e92-e0a6-42cf-a49d-178e2d8b91c5', 'content-type': 'application/json; charset=UTF-8', 'connection': 'close', 'status': '500', 'content-location': 'http://10.0.0.5:8774/v2.1/os-keypairs'}
        Body: b'{"computeFault": {"code": 500, "message": "Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.\\n<class \'oslo_db.exception.DBConnectionError\'>"}}'
}}}

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/tempest/scenario/test_network_v6.py", line 68, in setUp
    self.keypair = self.create_keypair()
  File "/usr/lib/python3.9/site-packages/tempest/scenario/manager.py", line 179, in create_keypair
    body = client.create_keypair(**kwargs)
  File "/usr/lib/python3.9/site-packages/tempest/lib/services/compute/keypairs_client.py", line 91, in create_keypair
    resp, body = self.post("os-keypairs", body=post_body)
  File "/usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py", line 299, in post
    return self.request('POST', url, extra_headers, headers, body, chunked)
  File "/usr/lib/python3.9/site-packages/tempest/lib/services/compute/base_compute_client.py", line 47, in request
    resp, resp_body = super(BaseComputeClient, self).request(
  File "/usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py", line 703, in request
    self._error_checker(resp, resp_body)
  File "/usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py", line 879, in _error_checker
    raise exceptions.ServerFault(resp_body, resp=resp,
tempest.lib.exceptions.ServerFault: Got server fault
Details: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_db.exception.DBConnectionError'>

But i did not see anything on mysql logs.

Checking the nova-compute service, i notice this...

Read more...

Arx Cruz (arxcruz)
summary: - Some tripleo-ci jobs are failing on tempest tests with ipv6 error
- related
+ Tripleo-CI Master and Wallaby jobs failing with keystone lost mysql
+ connection
Changed in tripleo:
milestone: xena-3 → yoga-3
Revision history for this message
Douglas Viroel (dviroel) wrote :

I can see similar issues in component line, when trying to promote tempest component. Standalone-full-tempest-api has lot of failures [1] since 2022-03-01. Looking into commits merged between last successfull run[2] and first occurence of these errors[3], I found a related issue [4] which has at least 3 fixes being backported.
Comparing packages installed, last successfull[5] and first failing[6] jobs have an oslo-cache change that match which this other LP Bug:
-- python3-oslo-cache.noarch 2.7.0-0.20220215072603.062d29b.el9
++ python3-oslo-cache.noarch 2.7.1-0.20220225083248.d0252f6.el9
 where d0252f6 is https://opendev.org/openstack/oslo.cache/commit/d0252f62f3b61253d0bcd953329f7b52e53c6e81

Not sure if is a drawback or just coincidence.

[1] https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-standalone-full-tempest-api-tempest-wallaby
[2] https://logserver.rdoproject.org/openstack-component-tempest/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-standalone-full-tempest-api-tempest-wallaby/ca6ad75/
[3] https://logserver.rdoproject.org/75/40075/1/check/periodic-tripleo-ci-centos-9-standalone-full-tempest-api-tempest-wallaby/d477543/
[4] https://bugs.launchpad.net/oslo.cache/+bug/1959562
[5] https://logserver.rdoproject.org/openstack-component-tempest/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-standalone-full-tempest-api-tempest-wallaby/ca6ad75/logs/undercloud/var/log/extra/package-list-installed.txt.gz
[6] https://logserver.rdoproject.org/75/40075/1/check/periodic-tripleo-ci-centos-9-standalone-full-tempest-api-tempest-wallaby/d477543/logs/undercloud/var/log/extra/package-list-installed.txt.gz

Revision history for this message
Arx Cruz (arxcruz) wrote :

According tkajinam some resources went down due to timeout

https://logserver.rdoproject.org/46/38646/16/check/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-master/b9ff5d0/logs/overcloud-controller-0/var/log/extra/pcs.txt.gz

Failed Resource Actions:
  * galera-bundle-podman-0 stop on overcloud-controller-0 could not be executed (Timed Out) because 'Resource agent did not complete in time' at Thu Mar 17 23:21:38 2022 after 2m1ms
  * rabbitmq-bundle-podman-0 stop on overcloud-controller-0 could not be executed (Timed Out) because 'Resource agent did not complete in time' at Thu Mar 17 23:21:38 2022 after 2m3ms
  * ovn-dbs-bundle-podman-0 stop on overcloud-controller-0 could not be executed (Timed Out) because 'Resource agent did not complete in time' at Thu Mar 17 23:19:37 2022 after 2m4ms
  * openstack-cinder-volume-podman-0 start on overcloud-controller-0 could not be executed (Timed Out) because 'Resource agent did not complete in time' at Thu Mar 17 23:08:59 2022 after 2m3ms

Revision history for this message
Arx Cruz (arxcruz) wrote :

Still happening on Wallaby, I did not see it on master as of today

Revision history for this message
Ronelle Landy (rlandy) wrote :

No longer showing up - closing this out

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.