Object version cap per service binary can cause restart failures during upgrade

Bug #2055095 reported by Dylan McCulloch
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
New
Undecided
Unassigned

Bug Description

During the Cinder upgrade from Wallaby to Xena there is a service object version increment from 1.38 to 1.39 [1]. This is the first increment since Train.

When a cinder service is started, rpcapi objects for each cinder service binary type are instantiated (cinder-volume, cinder-backup, cinder-scheduler) [2]. During that initialisation there is a check to determine the object version cap (minimum version) for each binary.
This is currently determined by listing the existing service hosts (in any state, e.g. enabled/disabled, up/down) for the relevant binary type and recording the minimum existing object version for that binary [3].
If the minimum object version for *any* one of those binaries is greater than the maximum compatible object version of the service that is being started (or restarted) then the service will fail to start [4].

e.g. The recommended upgrade sequence is [5]:
  cinder-scheduler --> cinder-volume --> cinder-backup --> cinder-api
If we:
 - upgrade all wallaby cinder-scheduler hosts to xena
 and
 - remove the old wallaby cinder-scheduler services by running `cinder-manage service remove <binary> <host>`
then the minimum object version for cinder-scheduler binary will be 1.39. If any of the other remaining wallaby cinder services (e.g. cinder-volume, cinder-api etc) are restarted before they are upgraded to xena they will fail to start.
(wallaby cinder services are only compatible with object version <= 1.38)

Example error:
  cinder.exception.CappedVersionUnknown: Unrecoverable Error: Versioned Objects in DB are capped to unknown version 1.39. Most likely your environment contains only new services and you're trying to start an older one. Use `cinder-manage service list` to check that and upgrade this service.

Does the object version check really need to be against each set of hosts per binary (as it is currently), or should the check instead determine the minimum object version across the entire list of cinder service hosts?

If a check across all hosts is not appropriate, perhaps the docs should at least be updated to advise that old versions of cinder services should not be removed with cinder-manage from the service list until the entire upgrade (for all cinder service binaries) is complete?

[1] https://github.com/openstack/cinder/commit/94dfad99c2b39c594cbce2b9387d55a08594fa2b#diff-9f18846592e2ad896a6549155e209b46a08ba691d3cd826db3d54787b5d0f420
[2] https://github.com/openstack/cinder/blob/647fa0b10222c919dbeeeb19b761b5521fd01961/cinder/scheduler/manager.py#L106-L108
[3] https://github.com/openstack/cinder/blob/647fa0b10222c919dbeeeb19b761b5521fd01961/cinder/objects/service.py#L172-L194
[4] https://github.com/openstack/cinder/blob/647fa0b10222c919dbeeeb19b761b5521fd01961/cinder/objects/base.py#L540-L541
[5] https://docs.openstack.org/cinder/latest/admin/upgrades.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.