OpenStack Snap

after reboot all mysql-routers report error

Bug #2042351 reported by Marian Gasparovic on 2023-10-31

This bug affects 5 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Snap	Triaged	Critical	Unassigned

Bug Description

One node deployment of 2023.1/stable. After a reboot and 15 minutes wait all mysql-router units are in error

```
Unit certificate-authority/0* cinder-ceph-mysql-router/0* cinder-ceph/0* active cinder-mysql-router/0* cinder-mysql/0* cinder/0* active glance-mysql-router/0* glance-mysql/0* active glance/0* active heat-cfn-mysql-router/0* heat-cfn/0* active heat-mysql-router/0* heat-mysql/0* heat/0* active horizon-mysql-router/0* horizon-mysql/0* active horizon/0* active keystone-mysql-router/0* keystone-mysql/0* keystone/0* neutron-mysql-router/0* neutron-mysql/0* active neutron/0* active nova-api-mysql-router/0* nova-cell-mysql-router/0* nova-mysql-router/0* nova-mysql/0* nova/0* ovn-central/0* active ovn-relay/0* active placement-mysql-router/0* placement-mysql/0* placement/0* rabbitmq/0* active traefik/0* active Workload Agent Address Ports Message
active idle 10.1.106.167
error idle 10.1.106.132 hook failed: "start"
idle 10.1.106.129
error idle 10.1.106.177 hook failed: "start"
blocked idle 10.1.106.147 failed to recover cluster.
idle 10.1.106.154
active idle 10.1.106.137
idle 10.1.106.188 Primary
idle 10.1.106.130
error idle 10.1.106.190 hook failed: "start"
idle 10.1.106.180
error idle 10.1.106.169 hook failed: "start"
blocked idle 10.1.106.135 failed to recover cluster.
idle 10.1.106.163
active idle 10.1.106.153
idle 10.1.106.139 Primary
idle 10.1.106.150
error idle 10.1.106.162 hook failed: "start"
blocked idle 10.1.106.152 failed to recover cluster.
waiting idle 10.1.106.149 (workload) Not all relations are ready
active idle 10.1.106.144
idle 10.1.106.141 Primary
idle 10.1.106.178
error idle 10.1.106.148 hook failed: "start"
error idle 10.1.106.170 hook failed: "start"
error idle 10.1.106.168 hook failed: "start"
blocked idle 10.1.106.159 failed to recover cluster.
blocked idle 10.1.106.191 (container:nova-api) healthcheck failed: online
idle 10.1.106.186
idle 10.1.106.184
error idle 10.1.106.138 hook failed: "start"
blocked idle 10.1.106.143 failed to recover cluster.
blocked idle 10.1.106.189 (container:placement-api) healthcheck failed: online
idle 10.1.106.176
idle 10.1.106.187

```

Logs and artifacts - https://oil-jenkins.canonical.com/artifacts/a21d3c76-f49f-4922-84f3-1ecf0ad4eb8e/index.html

It is not a one off issue, I can reproduce it every time

Tags:

Marian Gasparovic (marosg) on 2023-10-31

tags:

added: cdo-qa

Revision history for this message

Guillaume Boutry (gboutry) wrote on 2023-10-31:

From the logs, we can see that the issue is not MySQL routers, but the MySQL instances.

Added comment about this:
https://github.com/canonical/mysql-k8s-operator/issues/329

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1875, in reboot_from_complete_outage
    self._run_mysqlsh_script("\n".join(reboot_from_outage_command))
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/src/mysql_k8s_helpers.py", line 675, in _run_mysqlsh_script
    raise MySQLClientError(e.stderr)
charms.mysql.v0.mysql.MySQLClientError: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
verbose: 2023-10-31T16:23:20Z: Loading startup files...
verbose: 2023-10-31T16:23:20Z: Loading plugins...
verbose: 2023-10-31T16:23:20Z: Connecting to MySQL at: <email address hidden>
verbose: 2023-10-31T16:23:20Z: Shell.connect: tid=95: CONNECTED: nova-mysql-0.nova-mysql-endpoints
verbose: 2023-10-31T16:23:20Z: Connecting to MySQL at: mysql://<email address hidden>:3306?connect-timeout=5000
verbose: 2023-10-31T16:23:20Z: Dba.reboot_cluster_from_complete_outage: tid=96: CONNECTED: nova-mysql-0.nova-mysql-endpoints:3306
verbose: 2023-10-31T16:23:20Z: Connecting to MySQL at: mysql://<email address hidden>:3306?connect-timeout=5000
verbose: 2023-10-31T16:23:20Z: Dba.reboot_cluster_from_complete_outage: tid=97: CONNECTED: nova-mysql-0.nova-mysql-endpoints:3306
verbose: 2023-10-31T16:23:20Z: Group Replication 'group_name' value: 28668da3-7802-11ee-b520-061a84483353
verbose: 2023-10-31T16:23:20Z: Metadata 'group_name' value: 28668da3-7802-11ee-b520-061a84483353
verbose: 2023-10-31T16:23:20Z: Connecting to MySQL at: mysql://<email address hidden>:3306?connect-timeout=5000
verbose: 2023-10-31T16:23:20Z: Dba.reboot_cluster_from_complete_outage: tid=98: CONNECTED: nova-mysql-0.nova-mysql-endpoints.openstack.svc.cluster.local:3306
verbose: 2023-10-31T16:23:20Z: Connecting to MySQL at: mysql://<email address hidden>:3306?connect-timeout=5000
verbose: 2023-10-31T16:23:20Z: Dba.reboot_cluster_from_complete_outage: tid=99: CONNECTED: nova-mysql-0.nova-mysql-endpoints.openstack.svc.cluster.local:3306
No PRIMARY member found for cluster 'cluster-b520e0bc6c2593c08ac554766c08fe32'
verbose: 2023-10-31T16:23:20Z: ClusterSet info: member, primary, not primary_invalidated, not removed from set, primary status: UNKNOWN
Restoring the Cluster 'cluster-b520e0bc6c2593c08ac554766c08fe32' from complete outage...

[31mERROR: [0mRuntimeError: The current session instance does not belong to the Cluster: 'cluster-b520e0bc6c2593c08ac554766c08fe32'.
Traceback (most recent call last):
File "<string>", line 2, in <module>
RuntimeError: Dba.reboot_cluster_from_complete_outage: The current session instance does not belong to the Cluster: 'cluster-b520e0bc6c2593c08ac554766c08fe32'.

From the logs, we can see that the issue is not MySQL routers, but the MySQL instances.

Added comment about this:
https://github.com/canonical/mysql-k8s-operator/issues/329

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1875, in reboot_from_complete_outage
    self._run_mysqlsh_script("\n".join(reboot_from_outage_command))
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/src/mysql_k8s_helpers.py", line 675, in _run_mysqlsh_script
    raise MySQLClientError(e.stderr)
charms.mysql.v0.mysql.MySQLClientError: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
verbose: 2023-10-31T16:23:20Z: Loading startup files...
verbose: 2023-10-31T16:23:20Z: Loading plugins...
verbose: 2023-10-31T16:23:20Z: Connecting to MySQL at: clusteradmin@nova-mysql-0.nova-mysql-endpoints
verbose: 2023-10-31T16:23:20Z: Shell.connect: tid=95: CONNECTED: nova-mysql-0.nova-mysql-endpoints
verbose: 2023-10-31T16:23:20Z: Connecting to MySQL at: mysql://clusteradmin@nova-mysql-0.nova-mysql-endpoints:3306?connect-timeout=5000
verbose: 2023-10-31T16:23:20Z: Dba.reboot_cluster_from_complete_outage: tid=96: CONNECTED: nova-mysql-0.nova-mysql-endpoints:3306
verbose: 2023-10-31T16:23:20Z: Connecting to MySQL at: mysql://clusteradmin@nova-mysql-0.nova-mysql-endpoints:3306?connect-timeout=5000
verbose: 2023-10-31T16:23:20Z: Dba.reboot_cluster_from_complete_outage: tid=97: CONNECTED: nova-mysql-0.nova-mysql-endpoints:3306
verbose: 2023-10-31T16:23:20Z: Group Replication 'group_name' value: 28668da3-7802-11ee-b520-061a84483353
verbose: 2023-10-31T16:23:20Z: Metadata 'group_name' value: 28668da3-7802-11ee-b520-061a84483353
verbose: 2023-10-31T16:23:20Z: Connecting to MySQL at: mysql://clusteradmin@nova-mysql-0.nova-mysql-endpoints.openstack.svc.cluster.local:3306?connect-timeout=5000
verbose: 2023-10-31T16:23:20Z: Dba.reboot_cluster_from_complete_outage: tid=98: CONNECTED: nova-mysql-0.nova-mysql-endpoints.openstack.svc.cluster.local:3306
verbose: 2023-10-31T16:23:20Z: Connecting to MySQL at: mysql://clusteradmin@nova-mysql-0.nova-mysql-endpoints.openstack.svc.cluster.local:3306?connect-timeout=5000
verbose: 2023-10-31T16:23:20Z: Dba.reboot_cluster_from_complete_outage: tid=99: CONNECTED: nova-mysql-0.nova-mysql-endpoints.openstack.svc.cluster.local:3306
No PRIMARY member found for cluster 'cluster-b520e0bc6c2593c08ac554766c08fe32'
verbose: 2023-10-31T16:23:20Z: ClusterSet info: member, primary, not primary_invalidated, not removed from set, primary status: UNKNOWN
Restoring the Cluster 'cluster-b520e0bc6c2593c08ac554766c08fe32' from complete outage...

[31mERROR: [0mRuntimeError: The current session instance does not belong to the Cluster: 'cluster-b520e0bc6c2593c08ac554766c08fe32'.
Traceback (most recent call last):
  File "<string>", line 2, in <module>
RuntimeError: Dba.reboot_cluster_from_complete_outage: The current session instance does not belong to the Cluster: 'cluster-b520e0bc6c2593c08ac554766c08fe32'.

Revision history for this message

Guillaume Boutry (gboutry) wrote on 2023-10-31:

Bug already reported: https://github.com/canonical/mysql-k8s-operator/issues/329