openstack-upgrade bionic:stein -> train without placement and then adding placement results in a confusing state

Bug #1910276 reported by Alex Kavanagh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Nova Cloud Controller Charm
In Progress
Medium
Alex Kavanagh
OpenStack Nova Compute Charm
Invalid
Undecided
Unassigned

Bug Description

See comment #4 below. Essentially, the charm can get to the state where the config option for openstack-origin doesn't match the installed payload version but due to placement being related to it, there is no user facing information that explains what the issue is.

Makes you want a "juju status-report" that includes an analysis by the charm itself of issues on the charm.

-- original below

Juju 2.7.8-bionic-amd64
Test run from func-target: bionic-rocky and then manual upgrades to stein, then train (leaving rmq and percona-cluster on bionic-rocky.

Result from /var/log/nova/nova-compute.log:

2021-01-05 17:57:02.384 28521 ERROR oslo_service.service [req-26da2780-45c2-4a66-900d-4099d11c6d02 - - - - -] Error starting thread.: oslo_messaging.rpc.client.RemoteError: Remote error: IncompatibleObjectVersion Version 2.6 of InstanceList is not supported, supported version is 2.4
['Traceback (most recent call last):\n', ' File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming\n res = self.dispatcher.dispatch(message)\n', ' File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch\n return self._do_dispatch(endpoint, method, ctxt, args)\n', ' File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch\n result = func(ctxt, **new_args)\n', ' File "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line 141, in object_class_action_versions\n objname, object_versions[objname])\n', ' File "/usr/lib/python3/dist-packages/oslo_versionedobjects/base.py", line 387, in obj_class_from_name\n supported=latest_ver)\n', 'oslo_versionedobjects.exception.IncompatibleObjectVersion: Version 2.6 of InstanceList is not supported, supported version is 2.4\n'].
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service Traceback (most recent call last):
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 810, in run_service
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service service.start()
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/service.py", line 174, in start
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service self.manager.init_host()
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1337, in init_host
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service expected_attrs=['info_cache', 'metadata', 'numa_topology'])
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_versionedobjects/base.py", line 177, in wrapper
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service args, kwargs)
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/conductor/rpcapi.py", line 241, in object_class_action_versions
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service args=args, kwargs=kwargs)
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 178, in call
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service retry=self.retry)
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 127, in _send
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service retry=retry)
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 644, in send
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service call_monitor_timeout, retry=retry)
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 635, in _send
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service raise result
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service oslo_messaging.rpc.client.RemoteError: Remote error: IncompatibleObjectVersion Version 2.6 of InstanceList is not supported, supported version is 2.4
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service ['Traceback (most recent call last):\n', ' File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming\n res = self.dispatcher.dispatch(message)\n', ' File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch\n return self._do_dispatch(endpoint, method, ctxt, args)\n', ' File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch\n result = func(ctxt, **new_args)\n', ' File "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line 141, in object_class_action_versions\n objname, object_versions[objname])\n', ' File "/usr/lib/python3/dist-packages/oslo_versionedobjects/base.py", line 387, in obj_class_from_name\n supported=latest_ver)\n', 'oslo_versionedobjects.exception.IncompatibleObjectVersion: Version 2.6 of InstanceList is not supported, supported version is 2.4\n'].
2021-01-05 17:57:02.384 28521 ERROR oslo_service.service

tags: added: openstack-upgrade
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

This was user error; I forgot to add placement to the cloud, but the main part of the error was that I didn't upgrade nova-cloud-controller from bionic-rocky. The cloud then worked after that.

Changed in charm-nova-compute:
status: New → Invalid
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Okay, it's back open. But actually due to nova-cloud-controller due to nova-cloud-controller not actually getting to "train" during the upgrade.

juju status for a "proper" bionic-train:

Every 5.0s: timeout 4 juju status -m zaza-7264deb88117 --color Thu Jan 7 12:22:58 2021

Model Controller Cloud/Region Version SLA Timestamp
zaza-7264deb88117 tinwood2-serverstack serverstack/serverstack 2.8.7 unsupported 12:22:59Z

App Version Status Scale Charm Store Rev OS Notes
..
nova-cloud-controller 20.4.0 active 1 nova-cloud-controller jujucharms 506 ubuntu
nova-compute 20.4.0 active 2 nova-compute jujucharms 518 ubuntu

juju status for the bionic-rocky -> stein -> train:

Every 5.0s: timeout 4 juju status -m zaza-208432b92193 --color tinwood-bastion: Thu Jan 7 12:23:56 2021

Model Controller Cloud/Region Version SLA Timestamp
zaza-208432b92193 tinwood-serverstack serverstack/serverstack 2.7.8 unsupported 12:23:57Z

App Version Status Scale Charm Store Rev OS Notes
..
nova-cloud-controller 19.3.1 active 1 nova-cloud-controller jujucharms 506 ubuntu
nova-compute 20.4.0 blocked 2 nova-compute jujucharms 518 ubuntu

Note that nova-cloud-controller is still on 19.3.1.

I'll dig more to work out why.

Changed in charm-nova-compute:
status: Invalid → New
status: New → Invalid
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Further evidence:

$ juju config nova-cloud-controller openstack-origin
cloud:bionic-train

juju config nova-cloud-controller action-managed-upgrade
false

On the nova-cloud-controller unit:

ubuntu@juju-330bd2-zaza-208432b92193-6:~$ apt search nova | grep installed

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

nova-api-os-compute/bionic-updates,now 2:19.3.1-0ubuntu1~cloud0 all [installed]
nova-common/bionic-updates,now 2:19.3.1-0ubuntu1~cloud0 all [installed,automatic]
nova-conductor/bionic-updates,now 2:19.3.1-0ubuntu1~cloud0 all [installed]
nova-placement-api/bionic-updates,now 2:19.3.1-0ubuntu1~cloud0 all [installed]
nova-scheduler/bionic-updates,now 2:19.3.1-0ubuntu1~cloud0 all [installed]
python3-cinderclient/bionic-updates,now 1:4.1.0-0ubuntu1~cloud0 all [installed,automatic]
python3-glanceclient/bionic-updates,now 1:2.16.0-0ubuntu1~cloud0 all [installed,automatic]
python3-neutronclient/bionic-updates,now 1:6.11.0-0ubuntu1~cloud0 all [installed,automatic]
python3-nova/bionic-updates,now 2:19.3.1-0ubuntu1~cloud0 all [installed]
python3-os-client-config/bionic,now 1.29.0-0ubuntu1 all [installed,automatic]
python3-os-win/bionic-updates,now 4.2.0-0ubuntu1~cloud0 all [installed,automatic]

(note that nova is on 19.3.1 and not 20.4.0).

ubuntu@juju-330bd2-zaza-208432b92193-6:/etc/apt/sources.list.d$ cat cloud-archive.list
# Ubuntu Cloud Archive
deb http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-updates/stein main

So basically, the cloud archive has not be updated despite the config being set to bionic-train and action-managed-upgrade not being needed.

Essentially, the charm didn't upgrade to bionic-train.

Triggered the openstack upgrade by triggering a new config-changed (in this case setting debug=true)

The nova-compute units then require their nova-compute systemd's to be restarted with:

juju run --app nova-compute "systemctl restart nova-compute"

After this the cloud came back to normal.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

So what is really happening is this (and it's not "necessarily" a bug). I'll update the title to reflect this.

The nova-cloud-controller charm AT stein will not upgrade unless placement is related to the unit. If it is not, then the upgrade bails out. At this stage config-changed is done, i.e. it's now at "cloud:bionic-train".

If the placement charm is then related to the unit, it removes the blocked state but won't actually work. i.e. nova-cloud-controller is still at stein with a placement at train (probably). If nova-compute is upgraded to train then the error occurs.

Possible changes are:

* Detect in the update-status that the config for openstack-origin doesn't match the version installed, and show a blocked message.
* On relating the charm to placement do the upgrade (assuming action-managed-upgrade is false) to ensure that the charm is at the version of openstack-origin that the config says it should be (along with action-managed-upgrade being false).

I think the first option is useful *anyway* as config not matching payload status IS a blocking or error condition.

I'll update the title to reflect this issue.

summary: - openstack-upgrade bionic:rocky -> stein -> train (starting clean)
- results in nova-compute not running
+ openstack-upgrade bionic:stein -> train without placement and then
+ adding placement results in a confusing state
description: updated
Changed in charm-nova-cloud-controller:
importance: Undecided → Medium
Changed in charm-nova-cloud-controller:
assignee: nobody → Alex Kavanagh (ajkavanagh)
status: New → In Progress
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.