No mechanism to wait for computes to update service version before restarting to remove RPC version cap after upgrade

Bug #1833542 reported by Mark Goddard
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Triaged
Wishlist
Unassigned

Bug Description

Description
===========

When performing an upgrade, services cap their RPC version when communicating with nova-compute to that of the compute service with the lowest version. Once all computes are running the new version, we can restart the services to remove this cap.

When deployment tools try to automate this procedure there is no interface available to check or wait for all computes to be running the new version.

Steps to reproduce
==================

Perform a rolling upgrade of nova, following https://docs.openstack.org/nova/latest/user/upgrade.html.

Expected results
================

After starting up nova services with the new code, there is some way to check which compute services are running the latest RPC version (from the 'services' DB table). Ideally it would not be necessary for the caller to know the actual minimum RPC version.

Actual results
==============

We need to insert a manual step to check the service versions in the database, or a 'sleep' for long enough that we can be sure that all services are up and running the new version.

Tags: upgrade
tags: added: upgrade
Revision history for this message
Matt Riedemann (mriedem) wrote :

I have thought about adding a new microversion to the os-services API to include the service version for something like this in the past but never went through with it. I'm not sure you'd want something to be relying the REST API during an upgrade anyway. I'm guessing we'd want something in the "nova-status upgrade" command area, either in the "check" subcommand (though that's more for pre and post validation - to which this could apply) or maybe some new subcommand.

I'm not sure what you mean by "Ideally it would not be necessary for the caller to know the actual minimum RPC version." Aren't you just looking for a command to say whether or not all services (does it need to be just nova-compute binaries or also things like nova-conductor, nova-scheduler, etc?) are running the current latest version for that release, i.e. this:

https://github.com/openstack/nova/blob/7ecdee01ed0efcecc4b448118a412d2c6554a27d/nova/objects/service.py#L34

This sounds like more of a specless blueprint than a bug so I'm going to mark this as wishlist but it sounds simple enough that we could add something to the nova-status upgrade command.

Changed in nova:
importance: Undecided → Wishlist
status: New → Triaged
Revision history for this message
Matt Riedemann (mriedem) wrote :

Maybe you want some options on the command to make it like a report, e.g. output the current version:

https://github.com/openstack/nova/blob/7ecdee01ed0efcecc4b448118a412d2c6554a27d/nova/objects/service.py#L34

And an option to output the minimum version for a given binary (e.g. nova-compute) or all services, and an option to say whether or not the minimum matches the current version (obviously a human can eyeball that to tell but you likely want something programmatic for that, like an exit code).

Revision history for this message
Matt Riedemann (mriedem) wrote :

It might also be worth having an option to dump all services whose versions are below the current version so you could debug why the deployment isn't fully upgraded yet.

Revision history for this message
Mark Goddard (mgoddard) wrote :

nova-status upgrade <something> seems like a reasonable place to put it.

I think we only need to wait for nova-compute services, since this seems to be the only one that gets capped.

A very simple implementation could be to replicate the logic here:
 https://github.com/openstack/nova/blob/7ecdee01ed0efcecc4b448118a412d2c6554a27d/nova/compute/rpcapi.py#L412.

It could compare this with the latest version:

https://github.com/openstack/nova/blob/7ecdee01ed0efcecc4b448118a412d2c6554a27d/nova/objects/service.py#L34

Then exit 0 if there is no cap required. The only argument necessary might be something to choose the branch of 'if CONF.api_database.connection:'.

Your suggestion of a report output could be useful for debugging, particularly without an API to determine this information.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.