oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.24.3' ([Errno 113] No route to host)")
Actually, there are other similar messages in other services which lost access to the DB:
2021-07-23 06:22:18.939 36 ERROR oslo_db.sqlalchemy.engines [-] Database connection was found disconnected; reconnecting: oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2006, "MySQL server has gone away (TimeoutError(110, 'Connection timed out'))")
I see a bunch of pacemakers timeouts and errors (that's the part i found similar):
Jul 23 06:20:59 standalone.localdomain pacemaker-controld[492447]: error: Result of start operation for openstack-cinder-backup-podman-0 on standalone: Timed Out
Jul 23 06:20:59 standalone.localdomain pacemaker-controld[492447]: notice: Transition 45 action 149 (openstack-cinder-backup-podman-0_start_0 on standalone): expected 'ok' but got 'error'
Jul 23 06:21:22 standalone.localdomain pacemaker-controld[492447]: notice: Transition 9 action 5 (haproxy-bundle-podman-0_monitor_60000 on standalone): expected 'ok' but got 'error'
Jul 23 06:22:10 standalone.localdomain pacemaker-controld[492447]: error: Result of stop operation for rabbitmq-bundle-podman-0 on standalone: Timed Out
Jul 23 06:21:22 standalone.localdomain pacemaker-controld[492447]: error: Result of monitor operation for haproxy-bundle-podman-0 on standalone: Timed Out
And resources were finally stopped.
Note this job is gating a minor update on ovs which i guess it's unrelated. Said this, the issue may be related to pacemaker but a different one to the reported in this LP.
Jul 23 06:22:10 standalone.localdomain pacemaker-execd[492444]: warning: rabbitmq-bundle-podman-0_stop_0 process (PID 633224) timed out
Jul 23 06:22:10 standalone.localdomain pacemaker-execd[492444]: warning: rabbitmq-bundle-podman-0_stop_0[633224] timed out after 20000ms
Jul 23 06:22:10 standalone.localdomain pacemaker-controld[492447]: error: Result of stop operation for rabbitmq-bundle-podman-0 on standalone: Timed Out
Sure. overcloud deployment fails with:
oslo_db. exception. DBConnectionErr or: (pymysql. err.Operational Error) (2003, "Can't connect to MySQL server on '192.168.24.3' ([Errno 113] No route to host)")
Actually, there are other similar messages in other services which lost access to the DB:
2021-07-23 06:22:18.939 36 ERROR oslo_db. sqlalchemy. engines [-] Database connection was found disconnected; reconnecting: oslo_db. exception. DBConnectionErr or: (pymysql. err.Operational Error) (2006, "MySQL server has gone away (TimeoutError(110, 'Connection timed out'))")
https:/ /logserver. rdoproject. org/46/ 34646/2/ check/rdoinfo- tripleo- master- testing- centos- 8-scenario001- standalone/ 81841c9/ logs/undercloud /var/log/ containers/ aodh/aodh- evaluator. log.txt. gz
Note that ip is not reacheable. That ip is a pacemaker resourece.
Looking at pacemaker messages in:
https:/ /logserver. rdoproject. org/46/ 34646/2/ check/rdoinfo- tripleo- master- testing- centos- 8-scenario001- standalone/ 81841c9/ logs/undercloud /var/log/ extra/journal. txt.gz
I see a bunch of pacemakers timeouts and errors (that's the part i found similar):
Jul 23 06:20:59 standalone. localdomain pacemaker- controld[ 492447] : error: Result of start operation for openstack- cinder- backup- podman- 0 on standalone: Timed Out localdomain pacemaker- controld[ 492447] : notice: Transition 45 action 149 (openstack- cinder- backup- podman- 0_start_ 0 on standalone): expected 'ok' but got 'error'
Jul 23 06:20:59 standalone.
Jul 23 06:21:22 standalone. localdomain pacemaker- controld[ 492447] : notice: Transition 9 action 5 (haproxy- bundle- podman- 0_monitor_ 60000 on standalone): expected 'ok' but got 'error'
Jul 23 06:22:10 standalone. localdomain pacemaker- controld[ 492447] : error: Result of stop operation for rabbitmq- bundle- podman- 0 on standalone: Timed Out
Jul 23 06:21:22 standalone. localdomain pacemaker- controld[ 492447] : error: Result of monitor operation for haproxy- bundle- podman- 0 on standalone: Timed Out
And resources were finally stopped.
Note this job is gating a minor update on ovs which i guess it's unrelated. Said this, the issue may be related to pacemaker but a different one to the reported in this LP.
Jul 23 06:22:10 standalone. localdomain pacemaker- execd[492444] : warning: rabbitmq- bundle- podman- 0_stop_ 0 process (PID 633224) timed out localdomain pacemaker- execd[492444] : warning: rabbitmq- bundle- podman- 0_stop_ 0[633224] timed out after 20000ms localdomain pacemaker- controld[ 492447] : error: Result of stop operation for rabbitmq- bundle- podman- 0 on standalone: Timed Out
Jul 23 06:22:10 standalone.
Jul 23 06:22:10 standalone.