_assess_status: RuntimeError: Dba.get_cluster: This function is not available through a session to an instance belonging to an unmanaged replication group

Bug #1889792 reported by Dmitrii Shcherbakov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL InnoDB Cluster Charm
Fix Released
Medium
Unassigned
OpenStack Charm Guide
Fix Released
Medium
Peter Matulis

Bug Description

Encountered here: https://review.opendev.org/#/c/743364/

A new unit has been added to the cluster after a removal of a different unit. Juju status:

mysql-innodb-cluster/1 active executing 2 172.17.110.3 Unit is ready: Mode: R/W
mysql-innodb-cluster/2* active executing 3 172.17.110.33 Unit is ready: Mode: R/O
mysql-innodb-cluster/4 blocked idle 6 172.17.110.30 MySQL InnoDB Cluster not healthy: None

https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline_func_full/openstack/charm-mysql-innodb-cluster/743364/1/6528/index.html

tracer: -- dequeue handler reactive/mysql_innodb_cluster_handlers.py:329:configure_certificates
2020-07-28 18:55:25 DEBUG juju-log certificates:4: Running _assess_status()
2020-07-28 18:55:26 DEBUG certificates-relation-changed active
2020-07-28 18:55:27 DEBUG juju-log certificates:4: Opening db connection for root@localhost
2020-07-28 18:55:27 DEBUG juju-log certificates:4: Checking cluster status.
2020-07-28 18:55:53 ERROR juju-log certificates:4: Cluster is unavailable: Logger: Tried to log to an uninitialized logger.
Traceback (most recent call last):
  File "<string>", line 2, in <module>
SystemError: RuntimeError: Dba.get_cluster: This function is not available through a session to an instance belonging to an unmanaged replication group

There isn't a lot of information about this error (0 docs) but I managed to find some in the mysql-shell repo:

https://github.com/mysql/mysql-shell/blob/8.0.21/modules/adminapi/common/preconditions.cc#L858-L862
void check_preconditions(const std::string &function_name,
                         const Cluster_check_info &state,
                         FunctionAvailability *custom_func_avail) {
// ...
      case GRInstanceType::GroupReplication:
        error +=
            " to an instance belonging to an unmanaged replication "
            "group";
        break;

Unit tests also suggest that getCluster is not available for standalone instances:

https://github.com/mysql/mysql-shell/blob/8.0.21/unittest/scripts/auto/js_adminapi/validation/dba_preconditions.js#L31-L33
//@# Dba_preconditions_unmanaged_gr, get_cluster_fails
// getCluster is not allowed on standalone instances
||Dba.getCluster: This function is not available through a session to an instance belonging to an unmanaged replication group

Based on the error logs from multiple units, I can see that the 4th unit left the group at some point which triggered the error since it was no longer a part of the group at that point:

mysql-innodb-cluster/4

2020-07-28T17:57:11.305946Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.21-0ubuntu0.20.04.3' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu).
2020-07-28T17:57:11.401793Z 10 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-07-28T17:57:14.710418Z 0 [Warning] [MY-013426] [Repl] Plugin group_replication reported: 'Member version is read compatible with the group.'
2020-07-28T17:57:14.710600Z 2 [System] [MY-011511] [Repl] Plugin group_replication reported: 'This server is working as secondary member with primary member address 172.17.110.3:3306.'
2020-07-28T17:57:15.711402Z 0 [ERROR] [MY-013467] [Repl] Plugin group_replication reported: 'No valid or ONLINE members exist to get the missing data from the group. For cloning check if donors of the same version and with clone plugin installed exist. For incremental recovery check if you have donors where the required data was not purged from the binary logs.'
2020-07-28T17:57:15.711510Z 0 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'
2020-07-28T17:57:16.712503Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to 172.17.110.30:3306, 172.17.110.3:3306, 172.17.110.33:3306 on view 15959464471091338:11.'
2020-07-28T17:57:20.913218Z 0 [System] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.'

mysql-innodb-cluster/2 unit (R/O) also encountered many invalid replication timestamps - this is likely due to clock skews in an overcommitted infrastructure environment:

2020-07-28T16:24:13.300814Z 840 [Warning] [MY-010956] [Server] Invalid replication timestamps: original commit timestamp is more recent than the immediate commit timestamp. This may be an issue if delayed replication is active. Make sure that servers have their clocks set to the correct time. No further message will be emitted until after timestamps become valid again.
2020-07-28T16:24:13.571277Z 840 [Warning] [MY-010957] [Server] The replication timestamps have returned to normal values.

Either way, we need to handle status checks better in that regard as this does not help a lot:

mysql-innodb-cluster/4 blocked idle 6 172.17.110.30 MySQL InnoDB Cluster not healthy: None

Tags: scaleback
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Ryan Beisner (1chb1n)
tags: added: scaleback
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

This seems to be still an issue. Hybrid512 is seeing this consistently on fresh deployments, both with rev. 3 (20.10) and rev. 5 (21.01) of charm-mysql-innodb-cluster.

Changed in charm-mysql-innodb-cluster:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

To be clear, Hybrid512 is experiencing one unit of the cluster as "ready R/W" and the two other units as "not healthy: None"

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :
Revision history for this message
Przemyslaw Lal (przemeklal) wrote :

I hit a similar issue that might be related but on already deployed cluster after a couple of days, please have a look at lp:1917332.

Revision history for this message
David Ames (thedac) wrote :

TRIAGE:

*This bug* is a documentation bug. Create documentation on the proper removal of an instance from the cluster.

The original bug was caused by attempting to add a new instance with the same IP address as the instance that had been removed.

The smoking gun is to check the `cluster-status` action on a healthy instance. It will show the IP address of the original (and now duplicated) instance as "MISSING."

juju run-action --wait mysql-innnodb-cluster/leader cluster-status

DOCUMENTATION

Ideally, one should remove an instance using the `remove-instance` action before tearing down the machine running the instance. Obviously, that is not always possible, but it still remains necessary to run the `remove-instance` action (even when the instance is gone) in order to update the cluster metadata.

While the instance is running:

juju run-action --wait mysql-innnodb-cluster/leader remove-instance address=<INSTANCE IP TO REMOVE>

When machine is down or gone:

juju run-action --wait mysql-innnodb-cluster/leader remove-instance address=<INSTANCE IP TO REMOVE> force=True

Check cluster status:

juju run-action --wait mysql-innnodb-cluster/leader cluster-status

The removed instance's IP should no longer be in the cluster status metadata output. It is now safe to add a new instance even with the original IP.

**********************
*Note*: Due to a generic work load status of "MySQL InnoDB Cluster not healthy: None" more than one bug has been filed against this bug number. I have filed LP Bug#1917337 [0] to make this clearer.

For all bugs other than removing and replacing an instance with the same IP address, please file a new bug with debug-logs for all the msyql-innodb-cluster units attached for further investigation. The logs rather than the workload status identify the bug.
**********************

[0] https://bugs.launchpad.net/charm-mysql-innodb-cluster/+bug/1917337

David Ames (thedac)
Changed in charm-mysql-innodb-cluster:
importance: High → Medium
Revision history for this message
David Ames (thedac) wrote :

Related to removing and replacing an instance with the same IP address is the issue of removing leadership flags which suggest the instance with that IP is already configured and clustered handled in LP Bug#1922394 [0].

The charm bug will be handled in [0]. This bug will remain as a documentation bug. As it is still necessary to remove an instance from metadata.

[0] https://bugs.launchpad.net/charm-mysql-innodb-cluster/+bug/1922394

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to charm-mysql-innodb-cluster (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-mysql-innodb-cluster (master)

Reviewed: https://review.opendev.org/c/openstack/charm-mysql-innodb-cluster/+/786514
Committed: https://opendev.org/openstack/charm-mysql-innodb-cluster/commit/f22ca3b5b4dde7f92edb3a9b1e17835555590d1a
Submitter: "Zuul (22348)"
Branch: master

commit f22ca3b5b4dde7f92edb3a9b1e17835555590d1a
Author: David Ames <email address hidden>
Date: Wed Apr 14 11:42:18 2021 -0700

    Remove instance flags when instance removed

    Previously when an instance was removed the leadership settings and
    charms.reactive flags remained for that instance's IP address. If a new
    instance was subsequently added and happened to have the same IP address
    the charm would never add the new instance to the cluster because it
    believed the instance was already configured and clustered based on
    leader settings.

    Clear leader settings flags for instance cluster configured and
    clustered.

    Due to a bug in Juju the previous use of IP addresses with '.' were
    unable to be unset. Transform dotted flags to use '-' instead.

    func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/565

    Change-Id: If3ffa9e9191c057ac7e3d96bfcf84d8a3a2ad45a
    Closes-Bug: #1922394
    Related-Bug: #1889792

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to charm-mysql-innodb-cluster (stable/21.04)

Related fix proposed to branch: stable/21.04
Review: https://review.opendev.org/c/openstack/charm-mysql-innodb-cluster/+/799953

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-mysql-innodb-cluster (stable/21.04)

Reviewed: https://review.opendev.org/c/openstack/charm-mysql-innodb-cluster/+/799953
Committed: https://opendev.org/openstack/charm-mysql-innodb-cluster/commit/6d9a50774c4a1c7875bd2d90fe66b5cb5dafe63a
Submitter: "Zuul (22348)"
Branch: stable/21.04

commit 6d9a50774c4a1c7875bd2d90fe66b5cb5dafe63a
Author: David Ames <email address hidden>
Date: Wed Apr 14 11:42:18 2021 -0700

    Remove instance flags when instance removed

    Previously when an instance was removed the leadership settings and
    charms.reactive flags remained for that instance's IP address. If a new
    instance was subsequently added and happened to have the same IP address
    the charm would never add the new instance to the cluster because it
    believed the instance was already configured and clustered based on
    leader settings.

    Clear leader settings flags for instance cluster configured and
    clustered.

    Due to a bug in Juju the previous use of IP addresses with '.' were
    unable to be unset. Transform dotted flags to use '-' instead.

    Change-Id: If3ffa9e9191c057ac7e3d96bfcf84d8a3a2ad45a
    Closes-Bug: #1922394
    Related-Bug: #1889792
    (cherry picked from commit f22ca3b5b4dde7f92edb3a9b1e17835555590d1a)

Changed in charm-deployment-guide:
status: New → In Progress
Changed in charm-deployment-guide:
assignee: nobody → Peter Matulis (petermatulis)
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-deployment-guide (master)

Change abandoned by "James Page <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/charm-deployment-guide/+/810515
Reason: This review is > 12 weeks without comment, and failed testing the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

affects: charm-deployment-guide → charm-guide
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Note that a separate bug fixed the related charm issue: https://bugs.launchpad.net/charm-mysql-innodb-cluster/+bug/1922394

Changed in charm-mysql-innodb-cluster:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-guide (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-guide (master)

Reviewed: https://review.opendev.org/c/openstack/charm-guide/+/884853
Committed: https://opendev.org/openstack/charm-guide/commit/02d50f28159729180e4c525e21a1f327b60a701a
Submitter: "Zuul (22348)"
Branch: master

commit 02d50f28159729180e4c525e21a1f327b60a701a
Author: Alex Kavanagh <email address hidden>
Date: Wed May 31 10:38:26 2023 +0100

    Add operation: Remove a mysql8 database node

    Note originally this was a charm deployment guide review
    at the related ID.

    Change-Id: I47e4f727dc9868eabb943ab9f0f50df5ae8f78d7
    Related-ID: I07aa35937666aaf000f5c2bc5803bee672c1f126
    Closes-Bug: #1889792

Changed in charm-guide:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.