Object version cap per service binary can cause restart failures during upgrade
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Cinder |
New
|
Undecided
|
Unassigned |
Bug Description
During the Cinder upgrade from Wallaby to Xena there is a service object version increment from 1.38 to 1.39 [1]. This is the first increment since Train.
When a cinder service is started, rpcapi objects for each cinder service binary type are instantiated (cinder-volume, cinder-backup, cinder-scheduler) [2]. During that initialisation there is a check to determine the object version cap (minimum version) for each binary.
This is currently determined by listing the existing service hosts (in any state, e.g. enabled/disabled, up/down) for the relevant binary type and recording the minimum existing object version for that binary [3].
If the minimum object version for *any* one of those binaries is greater than the maximum compatible object version of the service that is being started (or restarted) then the service will fail to start [4].
e.g. The recommended upgrade sequence is [5]:
cinder-scheduler --> cinder-volume --> cinder-backup --> cinder-api
If we:
- upgrade all wallaby cinder-scheduler hosts to xena
and
- remove the old wallaby cinder-scheduler services by running `cinder-manage service remove <binary> <host>`
then the minimum object version for cinder-scheduler binary will be 1.39. If any of the other remaining wallaby cinder services (e.g. cinder-volume, cinder-api etc) are restarted before they are upgraded to xena they will fail to start.
(wallaby cinder services are only compatible with object version <= 1.38)
Example error:
cinder.
Does the object version check really need to be against each set of hosts per binary (as it is currently), or should the check instead determine the minimum object version across the entire list of cinder service hosts?
If a check across all hosts is not appropriate, perhaps the docs should at least be updated to advise that old versions of cinder services should not be removed with cinder-manage from the service list until the entire upgrade (for all cinder service binaries) is complete?
[1] https:/
[2] https:/
[3] https:/
[4] https:/
[5] https:/