[master] hook failed: "db-router-relation-departed" - during test_803_remove_fourth test

Bug #2020216 reported by Alex Kavanagh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL InnoDB Cluster Charm
Fix Committed
Undecided
Alex Kavanagh
Jammy
Fix Released
Undecided
Unassigned

Bug Description

Essentially, another race condition.

juju status:

Model Controller Cloud/Region Version SLA Timestamp
zaza-e602f27ff35f serverstack-serverstack serverstack/serverstack 2.9.42 unsupported 13:58:12Z

App Version Status Scale Charm Channel Rev Exposed Message
keystone 21.0.0 active 1 keystone yoga/stable 595 no Application Ready
keystone-mysql-router 8.0.33 active 1 mysql-router latest/edge 79 no Unit is ready
mysql-innodb-cluster 8.0.33 error 4 mysql-innodb-cluster 0 no hook failed: "db-router-relation-departed"
prometheus2 active 1 prometheus2 stable 52 no Ready
vault 1.7.9 active 1 vault 1.7/stable 107 no Unit is ready (active: true, mlock: enabled)
vault-mysql-router 8.0.33 active 1 mysql-router latest/edge 79 no Unit is ready

Unit Workload Agent Machine Public address Ports Message
keystone/0* active idle 0 172.16.0.28 5000/tcp Unit is ready
  keystone-mysql-router/0* active idle 172.16.0.28 Unit is ready
mysql-innodb-cluster/0* active idle 1 172.16.0.228 Unit is ready: Mode: R/W, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/2 error idle 3 172.16.0.246 hook failed: "db-router-relation-departed"
mysql-innodb-cluster/3 active idle 6 172.16.0.176 Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/4 active idle 7 172.16.0.188 Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
prometheus2/0* active idle 4 172.16.0.217 9090/tcp,12321/tcp Ready
vault/0* active idle 5 172.16.0.41 8200/tcp Unit is ready (active: true, mlock: enabled)
  vault-mysql-router/0* active idle 172.16.0.41 Unit is ready

Machine State Address Inst id Series AZ Message
0 started 172.16.0.28 39274ef4-d599-44a2-a746-db7608e2a483 jammy nova ACTIVE
1 started 172.16.0.228 704308d7-5882-499f-9771-5c06bf30a059 jammy nova ACTIVE
3 started 172.16.0.246 06b7e491-8d87-482f-9c26-d76140990423 jammy nova ACTIVE
4 started 172.16.0.217 f327528c-0de4-485b-8ed0-e001537f7bf5 focal nova ACTIVE
5 started 172.16.0.41 aac0bb0c-2f4f-4b4e-b7fb-23ac7896bf91 jammy nova ACTIVE
6 started 172.16.0.176 a2db224a-1e56-41b0-8d4e-64a95abcbba6 jammy nova ACTIVE
7 started 172.16.0.188 4e7e7443-d56c-46ce-bc54-6c0cbe92f33c jammy nova ACTIVE

Error from log:

2023-05-19 13:06:22 WARNING unit.mysql-innodb-cluster/2.db-router-relation-departed logger.go:60 Traceback (most recent call last):
2023-05-19 13:06:22 WARNING unit.mysql-innodb-cluster/2.db-router-relation-departed logger.go:60 File "/var/lib/juju/agents/unit-mysql-innodb-cluster-2/charm/hooks/db-router-relation-departed", line 22, in <module>
2023-05-19 13:06:22 WARNING unit.mysql-innodb-cluster/2.db-router-relation-departed logger.go:60 main()
2023-05-19 13:06:22 WARNING unit.mysql-innodb-cluster/2.db-router-relation-departed logger.go:60 File "/var/lib/juju/agents/unit-mysql-innodb-cluster-2/.venv/lib/python3.10/site-packages/charms/reactive/__init__.py", line 84, in main
2023-05-19 13:06:22 WARNING unit.mysql-innodb-cluster/2.db-router-relation-departed logger.go:60 hookenv._run_atexit()
2023-05-19 13:06:22 WARNING unit.mysql-innodb-cluster/2.db-router-relation-departed logger.go:60 File "/var/lib/juju/agents/unit-mysql-innodb-cluster-2/.venv/lib/python3.10/site-packages/charmhelpers/core/hookenv.py", line 1357, in _run_atexit
2023-05-19 13:06:22 WARNING unit.mysql-innodb-cluster/2.db-router-relation-departed logger.go:60 callback(*args, **kwargs)
2023-05-19 13:06:22 WARNING unit.mysql-innodb-cluster/2.db-router-relation-departed logger.go:60 File "/var/lib/juju/agents/unit-mysql-innodb-cluster-2/.venv/lib/python3.10/site-packages/charms_openstack/charm/core.py", line 1378, in atexit_assess_status
2023-05-19 13:06:22 WARNING unit.mysql-innodb-cluster/2.db-router-relation-departed logger.go:60 self._assess_status()
2023-05-19 13:06:22 WARNING unit.mysql-innodb-cluster/2.db-router-relation-departed logger.go:60 File "/var/lib/juju/agents/unit-mysql-innodb-cluster-2/charm/lib/charm/openstack/mysql_innodb_cluster.py", line 1708, in _assess_status
2023-05-19 13:06:22 WARNING unit.mysql-innodb-cluster/2.db-router-relation-departed logger.go:60 .format(self.get_cluster_instance_mode(),
2023-05-19 13:06:22 WARNING unit.mysql-innodb-cluster/2.db-router-relation-departed logger.go:60 File "/var/lib/juju/agents/unit-mysql-innodb-cluster-2/charm/lib/charm/openstack/mysql_innodb_cluster.py", line 1358, in get_cluster_instance_mode
2023-05-19 13:06:22 WARNING unit.mysql-innodb-cluster/2.db-router-relation-departed logger.go:60 return (_status["defaultReplicaSet"]["topology"]
2023-05-19 13:06:22 WARNING unit.mysql-innodb-cluster/2.db-router-relation-departed logger.go:60 KeyError: '172.16.0.246:3306'
2023-05-19 13:06:22 ERROR juju.worker.uniter.operation runhook.go:153 hook "db-router-relation-departed" (via explicit, bespoke hook script) failed: exit status 1

Relevant code from src/lib/charm/openstack/mysql_innodb_cluster.py:

    def get_cluster_instance_mode(self, nocache=False):
        """Get cluster status mode

        Return cluster.status()["defaultReplicaSet"]["topology"]. This will be
        "R/W" or "R/O" depending on the mode of this instance in the cluster.
        If cached data exists and is not explicity avoided with the nocache
        parameter, avoid the call to self.get_cluster_status.

        :param nocache: Do not return cached data
        :type nocache: Boolean
        :side effect: Calls self.get_cluster_status
        :returns: String mode. i.e. "R/W" or "R/O"
        :rtype: Union[None, str]
        """
        if self._cached_cluster_status and not nocache:
            _status = self._cached_cluster_status
        else:
            _status = self.get_cluster_status(nocache=nocache)
        if not _status:
            return
        return (_status["defaultReplicaSet"]["topology"]
                ["{}:{}".format(self.cluster_address, self.cluster_port)]
                ["mode"])

Proposed fix:

Put a try-except around the final return, and return None on KeyError. Just need to check that propagating the None won't cause problems.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-mysql-innodb-cluster (master)
Changed in charm-mysql-innodb-cluster:
status: New → In Progress
Changed in charm-mysql-innodb-cluster:
assignee: nobody → Alex Kavanagh (ajkavanagh)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-mysql-innodb-cluster (master)

Reviewed: https://review.opendev.org/c/openstack/charm-mysql-innodb-cluster/+/883698
Committed: https://opendev.org/openstack/charm-mysql-innodb-cluster/commit/2af9b1b4e933751b2ff794a035a6b603067ca0a9
Submitter: "Zuul (22348)"
Branch: master

commit 2af9b1b4e933751b2ff794a035a6b603067ca0a9
Author: Alex Kavanagh <email address hidden>
Date: Fri May 19 18:45:03 2023 +0100

    Fix status during db-router-relation-departed

    The functions get_cluster_instance_mode() and
    get_cluster_status_text() should be able to return None if they can't
    access the relevant status, but it's possible that keys may exist in the
    status and this causes KeyErrors. This patch makes the functions more
    robust such that they return None if the items don't exist. This only
    affects reporting on the status line, rather than functionality in the
    charm.

    Change-Id: Ie63438cc8801224b0608203f7c935c430ef46045
    Closes-Bug: #2020216

Changed in charm-mysql-innodb-cluster:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-mysql-innodb-cluster (stable/jammy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-mysql-innodb-cluster (stable/jammy)

Reviewed: https://review.opendev.org/c/openstack/charm-mysql-innodb-cluster/+/885020
Committed: https://opendev.org/openstack/charm-mysql-innodb-cluster/commit/42539e2aca3dafb0436bf037ca42717060d59d45
Submitter: "Zuul (22348)"
Branch: stable/jammy

commit 42539e2aca3dafb0436bf037ca42717060d59d45
Author: Alex Kavanagh <email address hidden>
Date: Fri May 19 18:45:03 2023 +0100

    Fix status during db-router-relation-departed

    The functions get_cluster_instance_mode() and
    get_cluster_status_text() should be able to return None if they can't
    access the relevant status, but it's possible that keys may exist in the
    status and this causes KeyErrors. This patch makes the functions more
    robust such that they return None if the items don't exist. This only
    affects reporting on the status line, rather than functionality in the
    charm.

    Change-Id: Ie63438cc8801224b0608203f7c935c430ef46045
    Closes-Bug: #2020216
    (cherry picked from commit 2af9b1b4e933751b2ff794a035a6b603067ca0a9)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.