Upgrade from Queens to Rocky causes volume, scheduler service failures until upgrade is complete across all units
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Cinder Charm |
Triaged
|
Medium
|
Unassigned |
Bug Description
On the latest stable cinder charm, in an HA configuration using hacluster. Following an openstack release upgrade from Queens to Rocky one unit went into a blocked state:
ubuntu@
Model Controller Cloud/Region Version SLA Timestamp
openstack foundations-maas maas_cloud 2.8.10 unsupported 13:58:05Z
App Version Status Scale Charm Store Rev OS Notes
cinder 13.0.9 blocked 3 cinder jujucharms 308 ubuntu
cinder-ceph 13.0.9 active 3 cinder-ceph jujucharms 260 ubuntu
hacluster-cinder active 3 hacluster jujucharms 74 ubuntu
nrpe-container active 3 nrpe jujucharms 70 ubuntu
public-
Unit Workload Agent Machine Public address Ports Message
cinder/0 active idle 0/lxd/0 10.244.49.42 8776/tcp Unit is ready
cinder-ceph/1 active idle 10.244.49.42 Unit is ready
hacluster-
nrpe-container/12 active idle 10.244.49.42 icmp,5666/tcp ready
public-
cinder/1 blocked idle 1/lxd/0 10.244.49.35 8776/tcp Services not running that should be: cinder-scheduler, cinder-volume
cinder-ceph/2 active idle 10.244.49.35 Unit is ready
hacluster-
nrpe-container/14 active idle 10.244.49.35 icmp,5666/tcp ready
public-
cinder/2* active idle 2/lxd/0 10.244.49.29 8776/tcp Unit is ready
cinder-ceph/0* active idle 10.244.49.29 Unit is ready
hacluster-
nrpe-container/13 active idle 10.244.49.29 icmp,5666/tcp ready
public-
Looking into the volume and scheduler logs an issue with versioned objects being capped:
2021-04-07 13:46:36.744 826223 INFO cinder.rpc [req-022c4cb1-
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume [req-022c4cb1-
new services and you're trying to start an older one. Use `cinder-manage service list` to check that and upgrade this service.
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume Traceback (most recent call last):
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume File "/usr/lib/
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume cluster=cluster)
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume File "/usr/lib/
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume cluster=cluster)
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume File "/usr/lib/
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume *args, **kwargs)
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume File "/usr/lib/
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume *args, **kwargs)
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume File "/usr/lib/
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume self.scheduler_
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume File "/usr/lib/
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume serializer = base.CinderObje
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume File "/usr/lib/
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume raise exception.
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume CappedVersionUn
2021-04-07 13:46:36.745 826223 ERROR cinder.cmd.volume
Processing the upgrade on this unit resolves the issue. This doesn't seem like a critical issue however this will disrupt any users attempting a live upgrade with no downtime to control plane services.
tags: | added: openstack-upgrade |
I suspect to fix this we would probably need to look at doing version pinning of the cinder (and other) services.