[upgrade] versions N and N+1 are not compatible
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Nova Cloud Controller Charm |
Fix Released
|
High
|
Chris MacNaughton | ||
OpenStack Nova Compute Charm |
Fix Released
|
High
|
Chris MacNaughton |
Bug Description
According to guide [1], while upgrading OpenStack services, a service in given version N should be compatible with its N+1 version. This is important when not being able to upgrade the cloud all at once, or when you have several nodes running the same services.
I performed some testing upgrading from Ocata to Pike, and Pike to Queens, following the upgrade order in [2], and I encountered the following issues:
Ocata => Pike
=============
After neutron-gateway upgrade, nova-api-metadata could not talk to nova-cloud-
2019-04-11 19:04:39.560 7011 ERROR oslo_service.
When running "curl 169.254.169.254" the error message was:
500 Internal Server Error
Remote metadata server experienced an internal server error.
After nova-cloud-
Remote error: UnsupportedVersion Endpoint does not support RPC version 4.4. Attempted method: select_destinations
+ openstack server add volume ins2_old v2_old
Unexpected API Error. Please report this at http://
<class 'oslo_messaging
+ openstack server add volume ins1_old v1_old
Unexpected API Error. Please report this at http://
<class 'oslo_messaging
$ openstack server list
Unexpected API Error. Please report this at http://
<class 'oslo_versioned
Upgrading nova-compute restored functionality.
Pike => Queens
==============
Upgrading neutron-gateway did not cause nova-api-metadata problems when upgrading to Queens, but upgrading nova-cloud-
+ openstack server migrate ins2_old --live juju-335fd8-
Unexpected API Error. Please report this at http://
<class 'oslo_messaging
+ openstack server add volume ins2_old v2_old
Unexpected API Error. Please report this at http://
<class 'oslo_messaging
++ openstack server show ins1_new -c status -f value
Unexpected API Error. Please report this at http://
<class 'oslo_versioned
Remote error: UnsupportedVersion Endpoint does not support RPC version 5.0. Attempted method: check_can
_live_migrate_
2019-04-16 20:09:15.485 30853 ERROR nova.api.
Again, upgrading nova-compute restored cloud functionality.
=======
Most of those problems seem related to the fact that the current charm implementations do not allow locking down the RPC version when performing upgrades. as suggested by [1].
[1] https:/
tags: | added: canonical-bootstack |
Changed in charm-nova-cloud-controller: | |
status: | New → Triaged |
Changed in charm-nova-compute: | |
status: | New → Triaged |
Changed in charm-nova-cloud-controller: | |
importance: | Undecided → High |
Changed in charm-nova-compute: | |
importance: | Undecided → High |
Changed in charm-nova-cloud-controller: | |
milestone: | 19.07 → 19.10 |
Changed in charm-nova-compute: | |
milestone: | 19.07 → 19.10 |
Changed in charm-nova-cloud-controller: | |
milestone: | 19.10 → 20.01 |
Changed in charm-nova-compute: | |
milestone: | 19.10 → 20.01 |
tags: | added: openstack-upgrade |
Changed in charm-nova-cloud-controller: | |
status: | Fix Committed → Fix Released |
Changed in charm-nova-compute: | |
status: | Fix Committed → Fix Released |
Marking field-high.
This bug causes every openstack cloud upgrade to experience an L2 severity outage for at least 2 hours on any production cloud we upgrade due to the availability of the API services but the failure of the backend services notifying the API layer that changes have been successful. This breaks any sort of automation, CI/CD, Orchestration against the cloud and causes need for operational escalation of what should be routine upgrades.
There is a workaround noted in https:/ /bugs.launchpad .net/nova/ +bug/1799186 which details to use
[upgrade_levels]
compute=auto
in nova.conf on the nova-cloud- controller units. This is effective to resolve issues for nova-conductor and nova-api-os-compute