charm needs to handle database not being available (was: mysql-innodb-cluster:certificates in a bundle is known to cause a race condition)

Bug #1984048 reported by Nobuto Murata
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL InnoDB Cluster Charm
Invalid
Undecided
Unassigned
OpenStack Bundles
Invalid
Undecided
Unassigned
OpenStack Octavia Charm
Triaged
High
Unassigned

Bug Description

The current openstack-base bundle has the following relation.

- - vault:certificates
  - mysql-innodb-cluster:certificates

https://github.com/openstack-charmers/openstack-bundles/blob/32f339112a44b1929bba068925ada3a83e738b84/development/openstack-base-focal-yoga/bundle.yaml#L136-L137

It can restart mysqld as soon as Vault's CA is available. And if that happens during init_db of a OpenStack API charm, the charm can be in an unrecoverable error. It may not happen with openstack-base out of the box, but by adding octavia on top for example it's easy to trigger the error.

mysql-innodb-cluster:certificates might be too drastic to be in a bundle deployment.

unit-mysql-innodb-cluster-1: 06:14:56 INFO unit.mysql-innodb-cluster/1.juju-log coordinator:6: Invoking reactive handler: reactive/mysql_innodb_cluster_handlers.py:376:request_certificates

unit-octavia-0: 06:14:56 DEBUG unit.octavia/0.amqp-relation-changed 2022-08-09 06:14:56.139 38987 CRITICAL octavia-db-manage [-] Unhandled error: sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')

unit-octavia-0: 06:14:56 ERROR unit.octavia/0.juju-log amqp:84: Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-octavia-0/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-octavia-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-octavia-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-octavia-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-octavia-0/charm/reactive/octavia_handlers.py", line 279, in init_db
    octavia_charm.db_sync()
  File "/var/lib/juju/agents/unit-octavia-0/.venv/lib/python3.8/site-packages/charms_openstack/charm/core.py", line 845, in db_sync
    subprocess.check_call(self.sync_cmd)
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', 'octavia-db-manage', 'upgrade', 'head']' returned non-zero exit status 1.

Revision history for this message
Nobuto Murata (nobuto) wrote :
Revision history for this message
Nobuto Murata (nobuto) wrote :

mysqld was started around 6:15.

$ juju run -a mysql-innodb-cluster -- 'ps aux | grep mysql[d]'
- Stdout: |
    mysql 81833 22.5 8.8 5639528 991860 ? Ssl 06:15 14:42 /usr/sbin/mysqld
  UnitId: mysql-innodb-cluster/0
- Stdout: |
    mysql 78404 3.9 6.0 4320160 678576 ? Ssl 06:16 2:26 /usr/sbin/mysqld
  UnitId: mysql-innodb-cluster/1
- Stdout: |
    mysql 80191 3.8 6.0 4254608 680776 ? Ssl 06:17 2:19 /usr/sbin/mysqld
  UnitId: mysql-innodb-cluster/2

And "Forcing close of thread 19995 user: 'octavia'."

2022-08-09T06:14:52.758658Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.30-0ubuntu0.20.04.2).
2022-08-09T06:14:55.992274Z 0 [System] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.'
2022-08-09T06:14:58.001073Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 19995 user: 'octavia'.
2022-08-09T06:15:49.034626Z 0 [Warning] [MY-011630] [Repl] Plugin group_replication reported: 'Due to a plugin error, some transactions were unable to be certified and will now rollback.'
2022-08-09T06:15:49.034839Z 19995 [ERROR] [MY-011615] [Repl] Plugin group_replication reported: 'Error while waiting for conflict detection procedure to finish on session 19995'
2022-08-09T06:15:49.034866Z 19995 [ERROR] [MY-010207] [Repl] Run function 'before_commit' in plugin 'group_replication' failed
2022-08-09T06:15:49.038577Z 0 [System] [MY-011651] [Repl] Plugin group_replication reported: 'Plugin 'group_replication' has been stopped.'
2022-08-09T06:15:50.284934Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.30-0ubuntu0.20.04.2) (Ubuntu).

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

I'm setting bundles and mysql charm to invalid; but adding the octavia charm.

The database being unavailable is a possibility for a number of reasons, including in this case the mysql charm bootstrapping into TLS. The octavia charm should be updated to handle the database not being available and blocking until it is.

Changed in charm-mysql-innodb-cluster:
status: New → Invalid
Changed in openstack-bundles:
status: New → Invalid
Changed in charm-octavia:
importance: Undecided → High
status: New → Triaged
summary: - mysql-innodb-cluster:certificates in a bundle is known to cause a race
- condition
+ charm needs to handle database not being available (was: mysql-innodb-
+ cluster:certificates in a bundle is known to cause a race condition)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.