CS8/9 OVB FS001/FS035 master/wallaby/train failing with DB Connection errors

Bug #1981478 reported by Douglas Viroel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-train job is failing on overcloud deploy with the following errors:

FATAL | Check Keystone service status | undercloud | item=swift | error={"ansible_job_id": "470471596554.315470", "ansible_loop_var": "tripleo_keystone_resources_service_async_result_item", "attempts": 1, "changed": false, "extra_data": {"data": null, "details": "Please contact the server administrator at: in the server error log.: [no address given] to inform them of the time this error occurred,: Internal Server Error: More information about this error may be available: misconfiguration and was unable to complete: and the actions you performed just before this error.: The server encountered an internal error or: 500 Internal Server Error: your request."
....
Failed to create service swift: Server Error for url: https://10.0.0.5:13000/v3/services, Please contact the server administrator at: in the server error log.: [no address given] to inform them of the time this error occurred,
https://logserver.rdoproject.org/32/40932/9/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-train/52ef39f/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

From Keystone logs on controller-1 we have the following error:
[Tue Jul 12 03:51:12.972306 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] mod_wsgi (pid=28): Exception occurred processing WSGI script '/var/www/cgi-bin/keystone/keystone'.
[Tue Jul 12 03:51:12.979653 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] Traceback (most recent call last):
[Tue Jul 12 03:51:12.979701 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 761, in _commit_impl
[Tue Jul 12 03:51:12.979705 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] self.engine.dialect.do_commit(self.connection)
[Tue Jul 12 03:51:12.979710 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] File "/usr/lib64/python3.6/site-packages/sqlalchemy/dialects/mysql/base.py", line 2215, in do_commit
[Tue Jul 12 03:51:12.979713 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] dbapi_connection.commit()
[Tue Jul 12 03:51:12.979718 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 422, in commit
[Tue Jul 12 03:51:12.979721 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] self._read_ok_packet()
[Tue Jul 12 03:51:12.979725 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 396, in _read_ok_packet
[Tue Jul 12 03:51:12.979728 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] pkt = self._read_packet()
[Tue Jul 12 03:51:12.979732 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 676, in _read_packet
[Tue Jul 12 03:51:12.979735 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] packet.raise_for_error()
[Tue Jul 12 03:51:12.979739 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] File "/usr/lib/python3.6/site-packages/pymysql/protocol.py", line 223, in raise_for_error
[Tue Jul 12 03:51:12.979741 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] err.raise_mysql_exception(self._data)
[Tue Jul 12 03:51:12.979746 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] File "/usr/lib/python3.6/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
[Tue Jul 12 03:51:12.979748 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] raise errorclass(errno, errval)
[Tue Jul 12 03:51:12.979762 2022] [wsgi:error] [pid 28] [remote 172.17.0.129:60904] pymysql.err.OperationalError: (1213, 'Deadlock: wsrep aborted transaction')

https://logserver.rdoproject.org/32/40932/9/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-train/52ef39f/logs/overcloud-controller-1/var/log/containers/httpd/keystone/keystone_wsgi_error.log.txt.gz

Log from another run:
https://logserver.rdoproject.org/29/37029/42/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-train/e019021/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

Revision history for this message
Douglas Viroel (dviroel) wrote :
Download full text (3.3 KiB)

looking at haproxy logs, we can see mysql server oscillation:

https://logserver.rdoproject.org/32/40932/9/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-train/52ef39f/logs/overcloud-controller-1/var/log/containers/haproxy/haproxy.log.txt.gz

Jul 12 03:53:09 overcloud-controller-1 haproxy[12]: Backup Server mysql/overcloud-controller-1.internalapi.localdomain is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 15ms. 0 active and 2 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
Jul 12 03:53:29 overcloud-controller-1 haproxy[12]: Backup Server mysql/overcloud-controller-1.internalapi.localdomain is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 15ms. 0 active and 3 backup servers online. Running on backup. 0 sessions requeued, 0 total in queue.
Jul 12 03:54:10 overcloud-controller-1 haproxy[12]: Backup Server mysql/overcloud-controller-1.internalapi.localdomain is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 17ms. 0 active and 2 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
Jul 12 03:54:12 overcloud-controller-1 haproxy[12]: Backup Server mysql/overcloud-controller-1.internalapi.localdomain is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 16ms. 0 active and 3 backup servers online. Running on backup. 0 sessions requeued, 0 total in queue.
Jul 12 03:54:39 overcloud-controller-1 haproxy[12]: Backup Server mysql/overcloud-controller-1.internalapi.localdomain is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 16ms. 0 active and 2 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
Jul 12 03:54:42 overcloud-controller-1 haproxy[12]: Backup Server mysql/overcloud-controller-1.internalapi.localdomain is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 14ms. 0 active and 3 backup servers online. Running on backup. 0 sessions requeued, 0 total in queue.
Jul 12 03:55:41 overcloud-controller-1 haproxy[12]: Backup Server mysql/overcloud-controller-2.internalapi.localdomain is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 14ms. 0 active and 2 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
Jul 12 03:55:43 overcloud-controller-1 haproxy[12]: Backup Server mysql/overcloud-controller-2.internalapi.localdomain is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 14ms. 0 active and 3 backup servers online. Running on backup. 0 sessions requeued, 0 total in queue.
Jul 12 03:57:36 overcloud-controller-1 haproxy[12]: Backup Server mysql/overcloud-controller-2.internalapi.localdomain is DOWN, reason: Layer4 timeout, check duration: 1000ms. 0 active and 2 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
Jul 12 03:57:37 overcloud-controller-1 haproxy[12]: Backup Server mysql/overcloud-controller-2.internalapi.localdomain is UP, reason: Layer7 check passed, code: 20...

Read more...

Douglas Viroel (dviroel)
summary: - CentOS-8 fs001 train failing on overcloud deploy: Keystone 500 Internal
- Server Error
+ CS8/9 FS001/FS035 master/wallaby/train failing with DB Connection errors
summary: - CS8/9 FS001/FS035 master/wallaby/train failing with DB Connection errors
+ CS8/9 OVB FS001/FS035 master/wallaby/train failing with DB Connection
+ errors
Revision history for this message
Douglas Viroel (dviroel) wrote :

https://logserver.rdoproject.org/openstack-component-tripleo/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-tripleo-wallaby/f3c6d44/logs/overcloud-controller-0/var/log/extra/errors.txt.gz

We can see the following errors:

 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last):
 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 771, in _commit_impl
 ERROR oslo_db.sqlalchemy.exc_filters self.engine.dialect.do_commit(self.connection)
 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python3.9/site-packages/sqlalchemy/dialects/mysql/base.py", line 2501, in do_commit
 ERROR oslo_db.sqlalchemy.exc_filters dbapi_connection.commit()
 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python3.9/site-packages/pymysql/connections.py", line 422, in commit
 ERROR oslo_db.sqlalchemy.exc_filters self._read_ok_packet()
 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python3.9/site-packages/pymysql/connections.py", line 396, in _read_ok_packet
 ERROR oslo_db.sqlalchemy.exc_filters pkt = self._read_packet()
 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python3.9/site-packages/pymysql/connections.py", line 676, in _read_packet
 ERROR oslo_db.sqlalchemy.exc_filters packet.raise_for_error()
 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python3.9/site-packages/pymysql/protocol.py", line 223, in raise_for_error
 ERROR oslo_db.sqlalchemy.exc_filters err.raise_mysql_exception(self._data)
 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python3.9/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
 ERROR oslo_db.sqlalchemy.exc_filters raise errorclass(errno, errval)
 ERROR oslo_db.sqlalchemy.exc_filters pymysql.err.OperationalError: (1180, 'Got error 6 "No such device or address" during COMMIT')

Revision history for this message
Douglas Viroel (dviroel) wrote :

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset035-master/f44e204/logs/overcloud-controller-0/var/log/extra/errors.txt.gz
...
ERROR /var/log/containers/nova/nova-api.log: 9 ERROR nova.api.openstack.wsgi pymysql.err.OperationalError: (1047, 'WSREP has not yet prepared node for application use')

Revision history for this message
Marios Andreou (marios-b) wrote :
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.