Mysql-Innodb-cluster : Instances not yet in the cluster; YOGA clean installation

Bug #2020408 reported by Nicolas FERRAO
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL InnoDB Cluster Charm
Triaged
Undecided
Unassigned

Bug Description

Hello,

I have an issue while deploying mysql-innodb-cluster on a fresh Openstack Yoga installation, following this process : https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/yoga/install-openstack.html

Environment :
- 1 MaaS (Juju client installed here)
- 1 JuJu Controller
- 3 Bare Metal nodes

JuJu version : 2.9.42-ubuntu-amd64
MaaS version : 3.3.3
Deployments happen behind Maas Proxy.

If I run juju deploy -n 3 --to lxd:0,lxd:1,lxd:2 --series jammy --channel 8.0/stable mysql-innodb-cluster

juju status --relations :
openstack maas-controller maas-dc/default 2.9.42 unsupported 05:49:13Z

App Version Status Scale Charm Channel Rev Exposed Message
ceph-osd 17.2.5 blocked 3 ceph-osd quincy/stable 559 no Missing relation: monitor
mysql-innodb-cluster 8.0.33 waiting 3 mysql-innodb-cluster 8.0/stable 43 no Instance not yet in the cluster
nova-compute 25.1.0 blocked 3 nova-compute yoga/stable 650 no Missing relations: image, messaging

Unit Workload Agent Machine Public address Ports Message
ceph-osd/0* blocked idle 0 172.16.22.2 Missing relation: monitor
ceph-osd/1 blocked idle 1 172.16.22.1 Missing relation: monitor
ceph-osd/2 blocked idle 2 172.16.22.3 Missing relation: monitor
mysql-innodb-cluster/9* waiting idle 0/lxd/1 172.16.22.6 Not all instances clustered
mysql-innodb-cluster/10 waiting idle 1/lxd/1 172.16.22.4 Instance not yet in the cluster
mysql-innodb-cluster/11 waiting idle 2/lxd/2 172.16.22.7 Not all instances clustered
nova-compute/0* blocked idle 0 172.16.22.2 Missing relations: image, messaging
nova-compute/1 blocked idle 1 172.16.22.1 Missing relations: messaging, image
nova-compute/2 blocked idle 2 172.16.22.3 Missing relations: image, messaging

Machine State Address Inst id Series AZ Message
0 started 172.16.22.2 BM-3 jammy HUB-DC-T Deployed
0/lxd/1 started 172.16.22.6 juju-e21d11-0-lxd-1 jammy HUB-DC-T Container started
1 started 172.16.22.1 BM-2 jammy HUB-DC-T Deployed
1/lxd/1 started 172.16.22.4 juju-e21d11-1-lxd-1 jammy HUB-DC-T Container started
2 started 172.16.22.3 BM-4 jammy HUB-DC-T Deployed
2/lxd/2 started 172.16.22.7 juju-e21d11-2-lxd-2 jammy HUB-DC-T Container started

Relation provider Requirer Interface Type Message
mysql-innodb-cluster:cluster mysql-innodb-cluster:cluster mysql-innodb-cluster peer
mysql-innodb-cluster:coordinator mysql-innodb-cluster:coordinator coordinator peer
nova-compute:compute-peer nova-compute:compute-peer nova peer

juju export-bundle :

series: jammy
applications:
  ceph-osd:
    charm: ceph-osd
    channel: quincy/stable
    revision: 559
    num_units: 3
    to:
    - "0"
    - "1"
    - "2"
    options:
      osd-devices: /dev/sda /dev/sdb /dev/sdc /dev/sdd
    constraints: arch=amd64 tags=compute
    storage:
      bluestore-db: loop,1024M
      bluestore-wal: loop,1024M
      cache-devices: loop,10240M
      osd-devices: loop,1024M
      osd-journals: loop,1024M
  mysql-innodb-cluster:
    charm: mysql-innodb-cluster
    channel: 8.0/stable
    revision: 43
    resources:
      mysql-shell: 0
    num_units: 3
    to:
    - lxd:1
    - lxd:2
    - lxd:0
    constraints: arch=amd64
  nova-compute:
    charm: nova-compute
    channel: yoga/stable
    revision: 650
    num_units: 3
    to:
    - "0"
    - "1"
    - "2"
    options:
      config-flags: default_ephemeral_format=ext4
      enable-live-migration: true
      enable-resize: true
      migration-auth-type: ssh
      openstack-origin: distro
      virt-type: qemu
    constraints: arch=amd64
    storage:
      ephemeral-device: loop,10240M
machines:
  "0":
    constraints: arch=amd64 tags=compute
  "1":
    constraints: arch=amd64 tags=compute
  "2":
    constraints: arch=amd64 tags=compute

When I check unit logs on the 1/lxd/1 :
2023-05-23 05:45:05 INFO unit.mysql-innodb-cluster/10.juju-log server.go:316 Invoking reactive handler: reactive/layer_openstack.py:64:default_update_status
2023-05-23 05:45:05 INFO unit.mysql-innodb-cluster/10.juju-log server.go:316 Invoking reactive handler: reactive/layer_openstack.py:82:check_really_is_update_status
2023-05-23 05:45:05 INFO unit.mysql-innodb-cluster/10.juju-log server.go:316 Invoking reactive handler: reactive/layer_openstack.py:93:run_default_update_status
2023-05-23 05:45:05 INFO unit.mysql-innodb-cluster/10.juju-log server.go:316 Invoking reactive handler: reactive/mysql_innodb_cluster_handlers.py:138:check_quorum
2023-05-23 05:45:05 INFO unit.mysql-innodb-cluster/10.juju-log server.go:316 Invoking reactive handler: reactive/mysql_innodb_cluster_handlers.py:240:signal_clustered
2023-05-23 05:45:05 INFO unit.mysql-innodb-cluster/10.juju-log server.go:316 Invoking reactive handler: hooks/relations/mysql-innodb-cluster/peers.py:69:joined:cluster
2023-05-23 05:45:05 INFO unit.mysql-innodb-cluster/10.juju-log server.go:316 Invoking reactive handler: hooks/relations/mysql-innodb-cluster/peers.py:75:changed:cluster
2023-05-23 05:45:05 INFO unit.mysql-innodb-cluster/10.juju-log server.go:316 Invoking reactive handler: hooks/relations/tls-certificates/requires.py:109:broken:certificates
2023-05-23 05:45:05 INFO unit.mysql-innodb-cluster/10.juju-log server.go:316 coordinator.DelayedActionCoordinator Publishing state
2023-05-23 05:45:06 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via explicit, bespoke hook script)

Logs on the leader 0/lxd/1 :
2023-05-23 06:29:59 INFO unit.mysql-innodb-cluster/9.juju-log server.go:316 Coordinator: Initializing coordinator layer
2023-05-23 06:29:59 INFO unit.mysql-innodb-cluster/9.juju-log server.go:316 Initializing Snap Layer
2023-05-23 06:29:59 INFO unit.mysql-innodb-cluster/9.juju-log server.go:316 Initializing Leadership Layer (is leader)
2023-05-23 06:29:59 INFO unit.mysql-innodb-cluster/9.juju-log server.go:316 Invoking reactive handler: reactive/layer_openstack.py:64:default_update_status
2023-05-23 06:29:59 INFO unit.mysql-innodb-cluster/9.juju-log server.go:316 Invoking reactive handler: reactive/mysql_innodb_cluster_handlers.py:138:check_quorum
2023-05-23 06:29:59 INFO unit.mysql-innodb-cluster/9.juju-log server.go:316 Invoking reactive handler: reactive/mysql_innodb_cluster_handlers.py:208:add_instances_to_cluster
2023-05-23 06:30:00 INFO unit.mysql-innodb-cluster/9.juju-log server.go:316 Adding instance, 172.16.22.4, to the cluster.
2023-05-23 06:30:01 ERROR unit.mysql-innodb-cluster/9.juju-log server.go:316 Failed adding instance 172.16.22.4 to cluster: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
Traceback (most recent call last):
  File "<string>", line 3, in <module>
RuntimeError: Cluster.add_instance: The instance '172.16.22.4:3306' is already part of another Replication Group

2023-05-23 06:30:01 WARNING unit.mysql-innodb-cluster/9.juju-log server.go:316 Instance: 172.16.22.7, already clustered.
2023-05-23 06:30:01 INFO unit.mysql-innodb-cluster/9.juju-log server.go:316 Invoking reactive handler: reactive/layer_openstack.py:82:check_really_is_update_status
2023-05-23 06:30:01 INFO unit.mysql-innodb-cluster/9.juju-log server.go:316 Invoking reactive handler: reactive/layer_openstack.py:93:run_default_update_status
2023-05-23 06:30:01 INFO unit.mysql-innodb-cluster/9.juju-log server.go:316 Invoking reactive handler: hooks/relations/tls-certificates/requires.py:109:broken:certificates
2023-05-23 06:30:01 INFO unit.mysql-innodb-cluster/9.juju-log server.go:316 Invoking reactive handler: hooks/relations/mysql-innodb-cluster/peers.py:69:joined:cluster
2023-05-23 06:30:01 INFO unit.mysql-innodb-cluster/9.juju-log server.go:316 Invoking reactive handler: hooks/relations/mysql-innodb-cluster/peers.py:75:changed:cluster
2023-05-23 06:30:01 INFO unit.mysql-innodb-cluster/9.juju-log server.go:316 coordinator.DelayedActionCoordinator Leader handling coordinator requests
2023-05-23 06:30:01 INFO unit.mysql-innodb-cluster/9.juju-log server.go:316 coordinator.DelayedActionCoordinator Publishing state
2023-05-23 06:30:02 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via explicit, bespoke hook script)

It appears that 1/lxd/1 is declared as member of another group, but I'm doing a fresh install so it's an odd result.

description: updated
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Unfortunately, there's not enough information here to determine what the problem may be. Please could you have a a read of https://docs.openstack.org/charm-guide/zed/community/software-bug.html

In particular, the juju logs (including the machine logs) and the mysql error logs are very important to try to work out what is going on. Thanks.

Changed in charm-mysql-innodb-cluster:
status: New → Incomplete
Revision history for this message
Nicolas FERRAO (nicolasfrr) wrote :

juju debug-log --include mysql-innodb-cluster --replay | tee bug_2020408_mysql-innodb-cluster.log
https://pastebin.ubuntu.com/p/V7Y3VMtMSk/
cat /var/log/mysql/error.log on mysql-innodb-cluster/0* :
https://pastebin.ubuntu.com/p/J9YTH3zvDb/
cat /var/log/mysql/error.log on mysql-innodb-cluster/1 :
https://pastebin.ubuntu.com/p/n57YmHtkZT/
cat /var/log/mysql/error.log on mysql-innodb-cluster/2 :
https://pastebin.ubuntu.com/p/fbdRVZzg3J/

I did a fresh reinstall and getting same error.

When you speak about machine logs :
- Juju controller's ?
- LXD having troubles?
- LXD's hosts?

Revision history for this message
Nicolas FERRAO (nicolasfrr) wrote :

Also, on the leader I get this :

mysql-py []> dba.get_cluster().status()
{
    "clusterName": "jujuCluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "172.16.22.4:3306",
        "ssl": "REQUIRED",
        "status": "OK_NO_TOLERANCE",
        "statusText": "Cluster is NOT tolerant to any failures.",
        "topology": {
            "172.16.22.4:3306": {
                "address": "172.16.22.4:3306",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.33"
            }
        },
        "topologyMode": "Single-Primary"
    },
    "groupInformationSourceMember": "172.16.22.4:3306"
}

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

A crashdump using the crashdump plugin would be really useful (https://github.com/juju/juju-crashdump), but failing that "juju show-unit" for each of the mysql-innodb-cluster units would be very useful, please, as it would show the world from the perspective of each juju unit.

Revision history for this message
Nicolas FERRAO (nicolasfrr) wrote :
Download full text (6.7 KiB)

For the craashdump it is generating. In thee meanwhile :
maas-infra:~$ juju show-unit mysql-innodb-cluster/0
mysql-innodb-cluster/0:
  workload-version: 8.0.33
  machine: 0/lxd/0
  opened-ports: []
  public-address: 172.16.22.4
  charm: ch:amd64/jammy/mysql-innodb-cluster-43
  leader: true
  life: alive
  relation-info:
  - relation-id: 1
    endpoint: cluster
    related-endpoint: cluster
    application-data: {}
    local-unit:
      in-scope: true
      data:
        cluster-address: '"172.16.22.4"'
        cluster-password: '"LCjtM8BB8XmZPPGJJ7HPCY3Fgj9NrZPdYTr9MGg2J6"'
        cluster-user: '"clusteruser"'
        egress-subnets: 172.16.22.4/32
        ingress-address: 172.16.22.4
        private-address: 172.16.22.4
        unit-configure-ready: "true"
    related-units:
      mysql-innodb-cluster/1:
        in-scope: true
        data:
          cluster-address: '"172.16.22.5"'
          cluster-password: '"LCjtM8BB8XmZPPGJJ7HPCY3Fgj9NrZPdYTr9MGg2J6"'
          cluster-user: '"clusteruser"'
          egress-subnets: 172.16.22.5/32
          ingress-address: 172.16.22.5
          private-address: 172.16.22.5
          unit-clustered: "true"
          unit-configure-ready: "true"
      mysql-innodb-cluster/2:
        in-scope: true
        data:
          cluster-address: '"172.16.22.6"'
          cluster-password: '"LCjtM8BB8XmZPPGJJ7HPCY3Fgj9NrZPdYTr9MGg2J6"'
          cluster-user: '"clusteruser"'
          egress-subnets: 172.16.22.6/32
          ingress-address: 172.16.22.6
          private-address: 172.16.22.6
          unit-configure-ready: "true"
  - relation-id: 2
    endpoint: coordinator
    related-endpoint: coordinator
    application-data: {}
    local-unit:
      in-scope: true
      data:
        coordinator: '{}'
        egress-subnets: 172.16.22.4/32
        ingress-address: 172.16.22.4
        private-address: 172.16.22.4
    related-units:
      mysql-innodb-cluster/1:
        in-scope: true
        data:
          coordinator: '{}'
          egress-subnets: 172.16.22.5/32
          ingress-address: 172.16.22.5
          private-address: 172.16.22.5
      mysql-innodb-cluster/2:
        in-scope: true
        data:
          coordinator: '{}'
          egress-subnets: 172.16.22.6/32
          ingress-address: 172.16.22.6
          private-address: 172.16.22.6
maas-infra:~$ juju show-unit mysql-innodb-cluster/1
mysql-innodb-cluster/1:
  workload-version: 8.0.33
  machine: 1/lxd/0
  opened-ports: []
  public-address: 172.16.22.5
  charm: ch:amd64/jammy/mysql-innodb-cluster-43
  leader: false
  life: alive
  relation-info:
  - relation-id: 1
    endpoint: cluster
    related-endpoint: cluster
    application-data: {}
    local-unit:
      in-scope: true
      data:
        cluster-address: '"172.16.22.5"'
        cluster-password: '"LCjtM8BB8XmZPPGJJ7HPCY3Fgj9NrZPdYTr9MGg2J6"'
        cluster-user: '"clusteruser"'
        egress-subnets: 172.16.22.5/32
        ingress-address: 172.16.22.5
        private-address: 172.16.22.5
        unit-clustered: "true"
        unit-configure-ready: "true"
    related-units:
      mysql-innodb-cluster/0:
        in-scope: true
        data:
          cluster-address: '"172.16.22.4"'
 ...

Read more...

Revision history for this message
Nicolas FERRAO (nicolasfrr) wrote :
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Thank you for the additional information. I think I've now worked out what is going on, and what the issue is. It's basically the same as https://bugs.launchpad.net/charm-mysql-innodb-cluster/+bug/2015774.

I think (but haven't verified) that the issue is that the mysql cluster 'leader' and the juju 'leader' are not the same unit; this means that the unit that isn't the juju leader can't cluster in the other users as they are denied access to the instance as they haven't been added by the other unit. I'm going to mark this one as a duplicate of that one.

---

You may be able to use the action "add-instance" to force the unit to join the cluster.

Changed in charm-mysql-innodb-cluster:
status: Incomplete → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.