OpenStack Cinder Charm

[focal][ussuri] Cinder volume service fails to start after authentication issue against innodb cluster

Bug #1907127 reported by Michael Skalka on 2020-12-07

This bug report is a duplicate of: Bug #1861523: mysql-router on a host with a VIP may incorrectly use the VIP to communicate with the cluster. Edit Remove

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Cinder Charm	Confirmed	Undecided	Unassigned

Bug Description

As seen during this test run: https://solutions.qa.canonical.com/testruns/testRun/01222b19-53cc-4a11-b532-790ec1041592
Crashdump here: https://oil-jenkins.canonical.com/artifacts/01222b19-53cc-4a11-b532-790ec1041592/generated/generated/openstack/juju-crashdump-openstack-2020-12-04-19.58.33.tar.gz

This was a Ussuri Focal run using our current stable skus. Deployment went fairly normal, however at some point cinder/0 got hung up due to a failed cinder-volume service:

cinder/0 blocked idle 1/lxd/2 10.244.40.244 8776/tcp Services not running that should be: cinder-volume
  cinder-ceph/2 waiting idle 10.244.40.244 Incomplete relations: ceph
  cinder-mysql-router/2 active idle 10.244.40.244 Unit is ready
  hacluster-cinder/2 active idle 10.244.40.244 Unit is ready and clustered
  logrotated/50 active idle 10.244.40.244 Unit is ready.
  public-policy-routing/35 active idle 10.244.40.244 Unit is ready
cinder/1 active idle 3/lxd/2 10.244.41.4 8776/tcp Unit is ready
  cinder-ceph/1 waiting idle 10.244.41.4 Incomplete relations: ceph
  cinder-mysql-router/1 active idle 10.244.41.4 Unit is ready
  hacluster-cinder/1 active idle 10.244.41.4 Unit is ready and clustered
  logrotated/48 active idle 10.244.41.4 Unit is ready.
  public-policy-routing/34 active idle 10.244.41.4 Unit is ready
cinder/2* active idle 5/lxd/2 10.244.40.253 8776/tcp Unit is ready
  cinder-ceph/0* waiting idle 10.244.40.253 Incomplete relations: ceph
  cinder-mysql-router/0* active idle 10.244.40.253 Unit is ready
  hacluster-cinder/0* active idle 10.244.40.253 Unit is ready and clustered
  logrotated/39 active idle 10.244.40.253 Unit is ready.
  public-policy-routing/25 active idle 10.244.40.253 Unit is ready

Journalctl on that unit was fairly unhelpful however looking into the cinder-volume log we can see the issue:

2020-12-04 15:17:15.244 64496 ERROR cinder.cmd.volume Traceback (most recent call last):
2020-12-04 15:17:15.244 64496 ERROR cinder.cmd.volume File "/usr/lib/python3/dist-packages/cinder/cmd/volume.py", line 100, in _launch_service
2020-12-04 15:17:15.244 64496 ERROR cinder.cmd.volume server = service.Service.create(host=host,
...
...
2020-12-04 15:17:15.244 64496 ERROR cinder.cmd.volume sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1045, "Access denied for user 'cinder'@'192.168.33.78' (using password: YES)")
2020-12-04 15:17:15.244 64496 ERROR cinder.cmd.volume (Background on this error at: http://sqlalche.me/e/e3q8)
2020-12-04 15:17:15.244 64496 ERROR cinder.cmd.volume
2020-12-04 15:17:15.255 64496 ERROR cinder.cmd.volume [req-e608b7f5-a06e-4aa2-97b8-3da861703ceb - - - - -] No volume service(s) started successfully, terminating.

The other two units have no such connection issues. The innodb cluster is also reporting healthy:

mysql-innodb-cluster/0 active idle 0/lxd/7 10.244.40.209 Unit is ready: Mode: R/O
mysql-innodb-cluster/1 active idle 2/lxd/7 10.244.40.255 Unit is ready: Mode: R/O
mysql-innodb-cluster/2* active idle 4/lxd/7 10.244.40.203 Unit is ready: Mode: R/W

Not sure what to make of this. Either the unit was never granted access to the db or it was never added to the ACL for that cluster properly is my best guess.

Revision history for this message

Sérgio Manso (sergiomanso) wrote on 2021-02-22:

I saw this same exact behavior in a focal-ussuri deployment. Two cinder units managed to establish connection with mysql-innodb-cluster but a third one failed, causing the cinder-volume failure.
The workaround found was restarting cinder-volume service inside the failing unit.

Revision history for this message

Alex Kavanagh (ajkavanagh) wrote on 2021-02-22:

This may be a duplicate of https://bugs.launchpad.net/charm-mysql-router/+bug/1861523 ?

summary:	- Cinder volume service fails to start after authentication issue against - innodb cluster + [focal][ussuri] Cinder volume service fails to start after + authentication issue against innodb cluster
Changed in charm-cinder:
status:	New → Confirmed