mysql-innodb-cluster stuck in "waiting" state for Focal Ussuri deployment

Bug #1902869 reported by Sachin Kulkarni
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
MySQL InnoDB Cluster Charm
Expired
Undecided
Unassigned

Bug Description

During deployment of Focal Ussuri, mysql-innodb-cluster charm gets stuck in "waiting" state forever and the deployment does not complete. Observed error
"SystemError: RuntimeError: Dba.create_cluster: Group Replication failed to start"

mysql-innodb-cluster - Stuck in "waiting" with message "Cluster jujuCluster not yet created by leader"
Ussuri Bundle - cs:bundle/openstack-base-70
mysql-innodb-cluster charms used -
cs:mysql-innodb-cluster-1 and
cs:~openstack-charmers-next/mysql-innodb-cluster-52

Debug logs -

charm: cs:mysql-innodb-cluster-1
================================

root@focalmaas:~# juju status mysql-innodb-cluster
Model Controller Cloud/Region Version SLA Timestamp
controller maas-controller mymaas/default 2.8.6 unsupported 10:56:10Z

App Version Status Scale Charm Store Rev OS Notes
mysql-innodb-cluster 8.0.22 waiting 3 mysql-innodb-cluster jujucharms 1 ubuntu

Unit Workload Agent Machine Public address Ports Message
mysql-innodb-cluster/0 waiting executing 1/lxd/3 172.172.1.9 'cluster' incomplete, Instance not yet configured for clustering
mysql-innodb-cluster/1 waiting executing 2/lxd/2 172.172.1.18 'cluster' incomplete, Instance not yet configured for clustering
mysql-innodb-cluster/2* waiting executing 3/lxd/2 172.172.1.11 Cluster jujuCluster not yet created by leader

Machine State DNS Inst id Series AZ Message
1 started 172.172.1.4 NODE2 focal default Deployed
1/lxd/3 started 172.172.1.9 juju-9981a4-1-lxd-3 focal default Container started
2 started 172.172.1.5 NODE3 focal default Deployed
2/lxd/2 started 172.172.1.18 juju-9981a4-2-lxd-2 focal default Container started
3 started 172.172.1.6 NODE4 focal default Deployed
3/lxd/2 started 172.172.1.11 juju-9981a4-3-lxd-2 focal default Container started

root@focalmaas:~#

charm: cs:~openstack-charmers-next/mysql-innodb-cluster-52 for series focal
==========================================================================

root@focalmaas:~# juju status mysql-innodb-cluster
Model Controller Cloud/Region Version SLA Timestamp
controller maas-controller mymaas/default 2.8.6 unsupported 11:21:57Z

App Version Status Scale Charm Store Rev OS Notes
mysql-innodb-cluster 8.0.22 waiting 3 mysql-innodb-cluster jujucharms 52 ubuntu

Unit Workload Agent Machine Public address Ports Message
mysql-innodb-cluster/3 waiting idle 3/lxd/6 172.172.1.18 Instance not yet configured for clustering
mysql-innodb-cluster/4* waiting executing 1/lxd/7 172.172.1.11 'cluster' incomplete, Cluster jujuCluster not yet created by leader
mysql-innodb-cluster/5 waiting idle 2/lxd/6 172.172.1.9 Instance not yet configured for clustering

Machine State DNS Inst id Series AZ Message
1 started 172.172.1.4 NODE2 focal default Deployed
1/lxd/7 started 172.172.1.11 juju-9981a4-1-lxd-7 focal default Container started
2 started 172.172.1.5 NODE3 focal default Deployed
2/lxd/6 started 172.172.1.9 juju-9981a4-2-lxd-6 focal default Container started
3 started 172.172.1.6 NODE4 focal default Deployed
3/lxd/6 started 172.172.1.18 juju-9981a4-3-lxd-6 focal default Container started

root@focalmaas:~#

unit-mysql-innodb-cluster-5: 11:08:26 INFO juju.worker.uniter.operation ran "cluster-relation-joined" hook (via explicit, bespoke hook script)
unit-mysql-innodb-cluster-4: 11:08:27 ERROR unit.mysql-innodb-cluster/4.juju-log Failed creating cluster: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
WARNING: The member will only proceed according to its exitStateAction if auto-rejoin fails (i.e. all retry attempts are exhausted).
Validating instance configuration at 172.172.1.11:3306...
This instance reports its own address as 172.172.1.11:3306
Instance configuration is suitable.
WARNING: The member will only proceed according to its exitStateAction if auto-rejoin fails (i.e. all retry attempts are exhausted).
NOTE: Group Replication will communicate with other members using '172.172.1.11:33061'. Use the localAddress option to override.

Creating InnoDB cluster 'jujuCluster' on '172.172.1.11:3306'...

Adding Seed Instance...
ERROR: Unable to start Group Replication for instance '172.172.1.11:3306'. Please check the MySQL server error log for more information.
Traceback (most recent call last):
  File "<string>", line 2, in <module>
SystemError: RuntimeError: Dba.create_cluster: Group Replication failed to start: MySQL Error 3092 (HY000): 172.172.1.11:3306: The server is not configured properly to be an active member of the group. Please see more details on error log.

description: updated
Revision history for this message
Michael Skalka (mskalka) wrote :

I believe we may be seeing a similar issue here: https://solutions.qa.canonical.com/testruns/testRun/90f2cf18-088c-49ef-abbc-a73d45530cfe
Crashdump: https://oil-jenkins.canonical.com/artifacts/90f2cf18-088c-49ef-abbc-a73d45530cfe/generated/generated/kubernetes/juju-crashdump-kubernetes-2021-04-23-17.51.35.tar.gz

in this example mysql/0 stays waiting and doe snot join the cluster, causing the other two units to hang as well.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Setting bug to incomplete. A lot of work has gone into the clustering code in the charm and it may be that this bug has been fixed. This bug will timeout in about 2 months if it is not re-opened.

Changed in charm-mysql-innodb-cluster:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MySQL InnoDB Cluster Charm because there has been no activity for 60 days.]

Changed in charm-mysql-innodb-cluster:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.