after reboot all mysql-routers report error

Bug #2042351 reported by Marian Gasparovic
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Snap
Triaged
Critical
Unassigned

Bug Description

One node deployment of 2023.1/stable. After a reboot and 15 minutes wait all mysql-router units are in error

```
Unit Workload Agent Address Ports Message
certificate-authority/0* active idle 10.1.106.167
cinder-ceph-mysql-router/0* error idle 10.1.106.132 hook failed: "start"
cinder-ceph/0* active idle 10.1.106.129
cinder-mysql-router/0* error idle 10.1.106.177 hook failed: "start"
cinder-mysql/0* blocked idle 10.1.106.147 failed to recover cluster.
cinder/0* active idle 10.1.106.154
glance-mysql-router/0* active idle 10.1.106.137
glance-mysql/0* active idle 10.1.106.188 Primary
glance/0* active idle 10.1.106.130
heat-cfn-mysql-router/0* error idle 10.1.106.190 hook failed: "start"
heat-cfn/0* active idle 10.1.106.180
heat-mysql-router/0* error idle 10.1.106.169 hook failed: "start"
heat-mysql/0* blocked idle 10.1.106.135 failed to recover cluster.
heat/0* active idle 10.1.106.163
horizon-mysql-router/0* active idle 10.1.106.153
horizon-mysql/0* active idle 10.1.106.139 Primary
horizon/0* active idle 10.1.106.150
keystone-mysql-router/0* error idle 10.1.106.162 hook failed: "start"
keystone-mysql/0* blocked idle 10.1.106.152 failed to recover cluster.
keystone/0* waiting idle 10.1.106.149 (workload) Not all relations are ready
neutron-mysql-router/0* active idle 10.1.106.144
neutron-mysql/0* active idle 10.1.106.141 Primary
neutron/0* active idle 10.1.106.178
nova-api-mysql-router/0* error idle 10.1.106.148 hook failed: "start"
nova-cell-mysql-router/0* error idle 10.1.106.170 hook failed: "start"
nova-mysql-router/0* error idle 10.1.106.168 hook failed: "start"
nova-mysql/0* blocked idle 10.1.106.159 failed to recover cluster.
nova/0* blocked idle 10.1.106.191 (container:nova-api) healthcheck failed: online
ovn-central/0* active idle 10.1.106.186
ovn-relay/0* active idle 10.1.106.184
placement-mysql-router/0* error idle 10.1.106.138 hook failed: "start"
placement-mysql/0* blocked idle 10.1.106.143 failed to recover cluster.
placement/0* blocked idle 10.1.106.189 (container:placement-api) healthcheck failed: online
rabbitmq/0* active idle 10.1.106.176
traefik/0* active idle 10.1.106.187

```

Logs and artifacts - https://oil-jenkins.canonical.com/artifacts/a21d3c76-f49f-4922-84f3-1ecf0ad4eb8e/index.html

It is not a one off issue, I can reproduce it every time

Tags: cdo-qa
tags: added: cdo-qa
Revision history for this message
Guillaume Boutry (gboutry) wrote :

From the logs, we can see that the issue is not MySQL routers, but the MySQL instances.

Added comment about this:
https://github.com/canonical/mysql-k8s-operator/issues/329

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1875, in reboot_from_complete_outage
    self._run_mysqlsh_script("\n".join(reboot_from_outage_command))
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/src/mysql_k8s_helpers.py", line 675, in _run_mysqlsh_script
    raise MySQLClientError(e.stderr)
charms.mysql.v0.mysql.MySQLClientError: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
verbose: 2023-10-31T16:23:20Z: Loading startup files...
verbose: 2023-10-31T16:23:20Z: Loading plugins...
verbose: 2023-10-31T16:23:20Z: Connecting to MySQL at: <email address hidden>
verbose: 2023-10-31T16:23:20Z: Shell.connect: tid=95: CONNECTED: nova-mysql-0.nova-mysql-endpoints
verbose: 2023-10-31T16:23:20Z: Connecting to MySQL at: mysql://<email address hidden>:3306?connect-timeout=5000
verbose: 2023-10-31T16:23:20Z: Dba.reboot_cluster_from_complete_outage: tid=96: CONNECTED: nova-mysql-0.nova-mysql-endpoints:3306
verbose: 2023-10-31T16:23:20Z: Connecting to MySQL at: mysql://<email address hidden>:3306?connect-timeout=5000
verbose: 2023-10-31T16:23:20Z: Dba.reboot_cluster_from_complete_outage: tid=97: CONNECTED: nova-mysql-0.nova-mysql-endpoints:3306
verbose: 2023-10-31T16:23:20Z: Group Replication 'group_name' value: 28668da3-7802-11ee-b520-061a84483353
verbose: 2023-10-31T16:23:20Z: Metadata 'group_name' value: 28668da3-7802-11ee-b520-061a84483353
verbose: 2023-10-31T16:23:20Z: Connecting to MySQL at: mysql://<email address hidden>:3306?connect-timeout=5000
verbose: 2023-10-31T16:23:20Z: Dba.reboot_cluster_from_complete_outage: tid=98: CONNECTED: nova-mysql-0.nova-mysql-endpoints.openstack.svc.cluster.local:3306
verbose: 2023-10-31T16:23:20Z: Connecting to MySQL at: mysql://<email address hidden>:3306?connect-timeout=5000
verbose: 2023-10-31T16:23:20Z: Dba.reboot_cluster_from_complete_outage: tid=99: CONNECTED: nova-mysql-0.nova-mysql-endpoints.openstack.svc.cluster.local:3306
No PRIMARY member found for cluster 'cluster-b520e0bc6c2593c08ac554766c08fe32'
verbose: 2023-10-31T16:23:20Z: ClusterSet info: member, primary, not primary_invalidated, not removed from set, primary status: UNKNOWN
Restoring the Cluster 'cluster-b520e0bc6c2593c08ac554766c08fe32' from complete outage...

ERROR: RuntimeError: The current session instance does not belong to the Cluster: 'cluster-b520e0bc6c2593c08ac554766c08fe32'.
Traceback (most recent call last):
  File "<string>", line 2, in <module>
RuntimeError: Dba.reboot_cluster_from_complete_outage: The current session instance does not belong to the Cluster: 'cluster-b520e0bc6c2593c08ac554766c08fe32'.

Revision history for this message
Guillaume Boutry (gboutry) wrote :
Changed in snap-openstack:
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
szymon roczniak (simon-4) wrote :

I also see this after each reboot, restarting mysql fixes this for me:

kubectl -n openstack rollout restart statefulset/mysql

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.