[focal][ussuri] Cinder volume service fails to start after authentication issue against innodb cluster

Bug #1907127 reported by Michael Skalka
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Cinder Charm
Confirmed
Undecided
Unassigned

Bug Description

As seen during this test run: https://solutions.qa.canonical.com/testruns/testRun/01222b19-53cc-4a11-b532-790ec1041592
Crashdump here: https://oil-jenkins.canonical.com/artifacts/01222b19-53cc-4a11-b532-790ec1041592/generated/generated/openstack/juju-crashdump-openstack-2020-12-04-19.58.33.tar.gz

This was a Ussuri Focal run using our current stable skus. Deployment went fairly normal, however at some point cinder/0 got hung up due to a failed cinder-volume service:

cinder/0 blocked idle 1/lxd/2 10.244.40.244 8776/tcp Services not running that should be: cinder-volume
  cinder-ceph/2 waiting idle 10.244.40.244 Incomplete relations: ceph
  cinder-mysql-router/2 active idle 10.244.40.244 Unit is ready
  hacluster-cinder/2 active idle 10.244.40.244 Unit is ready and clustered
  logrotated/50 active idle 10.244.40.244 Unit is ready.
  public-policy-routing/35 active idle 10.244.40.244 Unit is ready
cinder/1 active idle 3/lxd/2 10.244.41.4 8776/tcp Unit is ready
  cinder-ceph/1 waiting idle 10.244.41.4 Incomplete relations: ceph
  cinder-mysql-router/1 active idle 10.244.41.4 Unit is ready
  hacluster-cinder/1 active idle 10.244.41.4 Unit is ready and clustered
  logrotated/48 active idle 10.244.41.4 Unit is ready.
  public-policy-routing/34 active idle 10.244.41.4 Unit is ready
cinder/2* active idle 5/lxd/2 10.244.40.253 8776/tcp Unit is ready
  cinder-ceph/0* waiting idle 10.244.40.253 Incomplete relations: ceph
  cinder-mysql-router/0* active idle 10.244.40.253 Unit is ready
  hacluster-cinder/0* active idle 10.244.40.253 Unit is ready and clustered
  logrotated/39 active idle 10.244.40.253 Unit is ready.
  public-policy-routing/25 active idle 10.244.40.253 Unit is ready

Journalctl on that unit was fairly unhelpful however looking into the cinder-volume log we can see the issue:

2020-12-04 15:17:15.244 64496 ERROR cinder.cmd.volume Traceback (most recent call last):
2020-12-04 15:17:15.244 64496 ERROR cinder.cmd.volume File "/usr/lib/python3/dist-packages/cinder/cmd/volume.py", line 100, in _launch_service
2020-12-04 15:17:15.244 64496 ERROR cinder.cmd.volume server = service.Service.create(host=host,
...
...
2020-12-04 15:17:15.244 64496 ERROR cinder.cmd.volume sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1045, "Access denied for user 'cinder'@'192.168.33.78' (using password: YES)")
2020-12-04 15:17:15.244 64496 ERROR cinder.cmd.volume (Background on this error at: http://sqlalche.me/e/e3q8)
2020-12-04 15:17:15.244 64496 ERROR cinder.cmd.volume
2020-12-04 15:17:15.255 64496 ERROR cinder.cmd.volume [req-e608b7f5-a06e-4aa2-97b8-3da861703ceb - - - - -] No volume service(s) started successfully, terminating.

The other two units have no such connection issues. The innodb cluster is also reporting healthy:

mysql-innodb-cluster/0 active idle 0/lxd/7 10.244.40.209 Unit is ready: Mode: R/O
mysql-innodb-cluster/1 active idle 2/lxd/7 10.244.40.255 Unit is ready: Mode: R/O
mysql-innodb-cluster/2* active idle 4/lxd/7 10.244.40.203 Unit is ready: Mode: R/W

Not sure what to make of this. Either the unit was never granted access to the db or it was never added to the ACL for that cluster properly is my best guess.

Revision history for this message
Sérgio Manso (sergiomanso) wrote :

I saw this same exact behavior in a focal-ussuri deployment. Two cinder units managed to establish connection with mysql-innodb-cluster but a third one failed, causing the cinder-volume failure.
The workaround found was restarting cinder-volume service inside the failing unit.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :
summary: - Cinder volume service fails to start after authentication issue against
- innodb cluster
+ [focal][ussuri] Cinder volume service fails to start after
+ authentication issue against innodb cluster
Changed in charm-cinder:
status: New → Confirmed
Revision history for this message
Alexander Balderson (asbalderson) wrote :

We've seen this same thing with cinder-schedular as well, and a variety of other services here and there.

https://bugs.launchpad.net/charm-designate/+bug/1925233 for exmaple

it would make sense to be a duplicate of the charm-mysql-router bug.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.