charm fails to query mysql, mysql running leader-settings-changed

Bug #1928959 reported by Alexander Balderson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gnocchi Charm
New
Undecided
Unassigned
OpenStack Keystone Charm
New
Undecided
Unassigned

Bug Description

One of three gnocchi units went into an error state because it lost connection to mysql during a query. At the same time, the mysql-innodb-cluster is reporting that it is performing a server shutdown during the leader-settings-changed hook.

Im not sure if either gnocchi or mysql-innodb-router should defer/retry the request till the hook has finished, or if mysql-innodb-cluster should defer the hook until the requests have been made.

Gnocchi_0 upgrade log shows:
2021-05-19 13:31:59,256 [113103] CRITICAL root: Traceback (most recent call last):
....
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 657, in _read_packet
    packet_header = self._read_bytes(4)
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 706, in _read_bytes
    raise err.OperationalError(
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')
[SQL: SHOW VARIABLES LIKE 'sql_mode']
(Background on this error at: http://sqlalche.me/e/e3q8)

and at the same time mysql-innodb-cluster_0 shows:
2021-05-19 13:31:58 DEBUG jujuc server.go:211 running hook tool "network-get" for mysql-innodb-cluster/0-leader-settings-changed-7576194136442783270
2021-05-19 13:31:59 DEBUG jujuc server.go:211 running hook tool "juju-log" for mysql-innodb-cluster/0-leader-settings-changed-7576194136442783270
2021-05-19 13:31:59 ERROR juju-log Failed checking cluster status: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
Traceback (most recent call last):
  File "<string>", line 3, in <module>
mysqlsh.DBError: MySQL Error (1053): Cluster.status: Failed to execute query on Metadata server 192.168.33.211:3306: Server shutdown in progress

2021-05-19 13:31:59 DEBUG jujuc server.go:211 running hook tool "status-set" for mysql-innodb-cluster/0-leader-settings-changed-7576194136442783270
2021-05-19 13:31:59 DEBUG jujuc server.go:211 running hook tool "is-leader" for mysql-innodb-cluster/0-leader-settings-changed-7576194136442783270
2021-05-19 13:31:59 DEBUG jujuc server.go:211 running hook tool "is-leader" for mysql-innodb-cluster/0-leader-settings-changed-7576194136442783270

Crashdump at https://oil-jenkins.canonical.com/artifacts/fddd8b11-9429-41e2-a98b-90b94abe1785/generated/generated/openstack/juju-crashdump-openstack-2021-05-19-13.36.28.tar.gz

and full test run at:
https://solutions.qa.canonical.com/testruns/testRun/fddd8b11-9429-41e2-a98b-90b94abe1785

Revision history for this message
Michael Skalka (mskalka) wrote :
Revision history for this message
Michael Skalka (mskalka) wrote :

It would appear that this is more just brittleness of the upgrade script that the gnocchi charm runs. During cloud deployment it seems we can't guarantee the availability of the mysql database (it seems to restart arbitrarily) however we could handle lost connections in a less disruptive way.

Revision history for this message
Alexander Balderson (asbalderson) wrote :

we bumped into this again on a keystone unit while running validation on the cloud. All of the sudden requests were failing and it took a good minute for the cluster to recover and more tests to pass.

I'm attaching the crashdump for this run, you can find the errors in keystone_0's error log.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.