hook failed: "prometheus-relation-joined"

Bug #2009968 reported by Felipe Reyes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL InnoDB Cluster Charm
New
Undecided
Unassigned

Bug Description

Issue seen at https://openstack-ci-reports.ubuntu.com/artifacts/0d2/876740/2/check/jammy/0d23f79/

Unit Workload Agent Machine Public address Ports Message
keystone/0* active idle 0 172.16.0.207 5000/tcp Unit is ready
  keystone-mysql-router/0* active idle 172.16.0.207 Unit is ready
mysql-innodb-cluster/0 active idle 1 172.16.0.5 Unit is ready: Mode: R/W, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/1* active idle 2 172.16.0.135 Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/3 error idle 6 172.16.0.159 hook failed: "prometheus-relation-joined"
prometheus2/0* active idle 4 172.16.0.85 9090/tcp,12321/tcp Ready
vault/0* active idle 5 172.16.0.173 8200/tcp Unit is ready (active: true, mlock: enabled)
  vault-mysql-router/0* active idle 172.16.0.173 Unit is ready

2023-03-09 15:27:53 ERROR unit.mysql-innodb-cluster/3.juju-log server.go:316 prometheus:10: Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-innodb-cluster-3/.venv/lib/python3.10/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-mysql-innodb-cluster-3/.venv/lib/python3.10/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-mysql-innodb-cluster-3/.venv/lib/python3.10/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-mysql-innodb-cluster-3/.venv/lib/python3.10/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-mysql-innodb-cluster-3/charm/reactive/prometheus_mysql_exporter_handlers.py", line 53, in create_remote_prometheus_exporter_user
    if not instance.create_user(
  File "/var/lib/juju/agents/unit-mysql-innodb-cluster-3/charm/lib/charm/openstack/mysql_innodb_cluster.py", line 603, in create_user
    m_helper.connect(password=self.mysql_password)
  File "/var/lib/juju/agents/unit-mysql-innodb-cluster-3/.venv/lib/python3.10/site-packages/charmhelpers/contrib/database/mysql.py", line 107, in connect
    self.connection = MySQLdb.connect(**_connection_info)
  File "/var/lib/juju/agents/unit-mysql-innodb-cluster-3/.venv/lib/python3.10/site-packages/MySQLdb/__init__.py", line 123, in Connect
    return Connection(*args, **kwargs)
  File "/var/lib/juju/agents/unit-mysql-innodb-cluster-3/.venv/lib/python3.10/site-packages/MySQLdb/connections.py", line 185, in __init__
    super().__init__(*args, **kwargs2)
MySQLdb.OperationalError: (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 104")

Felipe Reyes (freyes)
tags: added: unstable-test
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

I think this is a dup of "During scale-out of cluster (zaza-openstack-tests) the leader fails to join in the new instance when related to prometheus" (https://bugs.launchpad.net/charm-mysql-innodb-cluster/+bug/2015256) which is a race between the prometheus hook and joining the instance into the cluster from the leader (which would be on different units for the bug to happen).

The first test that failed is the add unit test, which is when it all goes to pot.

2023-03-09 15:21:55.490440 | focal-medium | 2023-03-09 15:21:55 [INFO] test_801_add_unit (zaza.openstack.charm_tests.mysql.tests.MySQLInnoDBClusterScaleTest)
2023-03-09 15:21:55.490501 | focal-medium | 2023-03-09 15:21:55 [INFO] Add mysql-innodb-cluster node.
2023-03-09 15:21:55.490520 | focal-medium | 2023-03-09 15:21:55 [INFO] ...
2023-03-09 15:21:55.490534 | focal-medium | 2023-03-09 15:21:55 [INFO] Wait till model is idle ...
2023-03-09 15:21:55.993575 | focal-medium | 2023-03-09 15:21:55 [INFO] Adding unit after removed unit ...
2023-03-09 15:21:56.325775 | focal-medium | 2023-03-09 15:21:56 [INFO] Wait until 3 units ...
2023-03-09 15:21:56.416060 | focal-medium | 2023-03-09 15:21:56 [INFO] Wait for application states ...
2023-03-09 15:21:56.416955 | focal-medium | 2023-03-09 15:21:56 [INFO] Waiting for application states to reach targeted states.
2023-03-09 15:21:56.421116 | focal-medium | 2023-03-09 15:21:56 [INFO] Waiting for an application to be present
2023-03-09 15:21:56.421804 | focal-medium | 2023-03-09 15:21:56 [INFO] Now checking workload status and status messages
2023-03-09 15:21:56.930799 | focal-medium | 2023-03-09 15:21:56 [INFO] Application prometheus2 is ready.
2023-03-09 15:21:56.936002 | focal-medium | 2023-03-09 15:21:56 [INFO] Application keystone is ready.
2023-03-09 15:21:56.939789 | focal-medium | 2023-03-09 15:21:56 [INFO] Application keystone-mysql-router is ready.
2023-03-09 15:21:56.953975 | focal-medium | 2023-03-09 15:21:56 [INFO] Application vault is ready.
2023-03-09 15:21:56.957646 | focal-medium | 2023-03-09 15:21:56 [INFO] Application vault-mysql-router is ready.

Marking as a duplicate for the moment; but please un-dup if you feel it is different.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.