Upgrade from Newton to Ocata caused scheduler to fail

Bug #1809210 reported by Xav Paice
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Cinder Charm
Expired
High
Unassigned

Bug Description

Charm cs:cinder-276, xenial

When I upgraded the cloud from Ocata to Pike, the cinder-scheduler service fails to start with:

2018-12-20 05:08:29.901 759617 INFO cinder.service [-] Starting cinder-scheduler node (version 11.1.1)
2018-12-20 05:08:29.905 759617 INFO cinder.manager [req-f2a0c7b7-88d8-48d3-8a45-c112e930254b - - - - -] Initiating service 13 cleanup
2018-12-20 05:08:29.908 759617 INFO cinder.manager [req-f2a0c7b7-88d8-48d3-8a45-c112e930254b - - - - -] Service 13 cleanup completed.
2018-12-20 05:08:29.909 759617 DEBUG cinder.service [req-f2a0c7b7-88d8-48d3-8a45-c112e930254b - - - - -] Creating RPC server for service cinder-scheduler start /usr/lib/python2.7/dist-packages/cinder/service.py:219
2018-12-20 05:08:29.915 759617 DEBUG cinder.service [req-f2a0c7b7-88d8-48d3-8a45-c112e930254b - - - - -] Pinning object versions for RPC server serializer to 1.3 start /usr/lib/python2.7/dist-packages/cinder/service.py:226
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service [req-f2a0c7b7-88d8-48d3-8a45-c112e930254b - - - - -] Error starting thread.: RPCVersionCapError: Requested message version, 3.0 is incompatible. It needs to be equal in major version and less than or equal in minor version as the specified version cap 2.0.
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service Traceback (most recent call last):
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 721, in run_service
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service service.start()
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 259, in start
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service self.manager.init_host_with_rpc()
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/cinder/scheduler/manager.py", line 87, in init_host_with_rpc
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service self.request_service_capabilities(ctxt)
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/cinder/scheduler/manager.py", line 205, in request_service_capabilities
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service volume_rpcapi.VolumeAPI().publish_service_capabilities(context)
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/cinder/volume/rpcapi.py", line 239, in publish_service_capabilities
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service cctxt.cast(ctxt, 'publish_service_capabilities')
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 144, in cast
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service self._check_version_cap(msg.get('version'))
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 121, in _check_version_cap
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service version_cap=self.version_cap)
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service RPCVersionCapError: Requested message version, 3.0 is incompatible. It needs to be equal in major version and less than or equal in minor version as the specified version cap 2.0.
2018-12-20 05:08:29.959 759617 ERROR oslo_service.service
2018-12-20 05:08:29.977 759617 DEBUG oslo_concurrency.lockutils [req-ebb18cc4-69ec-45f4-99d6-95958322449e - - - - -] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:215
2018-12-20 05:08:29.978 759617 DEBUG oslo_concurrency.lockutils [req-ebb18cc4-69ec-45f4-99d6-95958322449e - - - - -] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:228

Revision history for this message
Xav Paice (xavpaice) wrote :

There were some entries in the Cinder db, in the 'services' table, which were elderly and should have been cleaned up, which were blocking the start of cinder-scheduler. When I deleted them using cinder-manage service remove cinder-volume cinder@LVM (etc) that fixed the issue.

https://pastebin.canonical.com/p/hYXqFwttk2/

Revision history for this message
Xav Paice (xavpaice) wrote :

This actually affected the Cinder volume and API services as well, the services started OK but didn't function until the db records had been cleaned up and the services restarted.

Revision history for this message
James Page (james-page) wrote :

As suggested looks like a similar issue to the nova compute service problem - old, crufty service entries cause issues on upgrade.

Changed in charm-cinder:
status: New → Triaged
importance: Undecided → High
Changed in charm-deployment-guide:
status: New → Triaged
importance: Undecided → High
tags: added: openstack-upgrade
removed: upgrade
Revision history for this message
Xav Paice (xavpaice) wrote :

For clarity, the commands used to remove the services (via ssh on one of the Cinder units):

For each service that isn't relevant any longer:
cinder-manage service remove cinder-volume <service-name>

In this case, the unwanted services:
cinder@LVM
juju-958f87-5-lxd-11@CEPH
juju-958f87-22-lxd-17@CEPH
juju-958f87-13-lxd-9@CEPH
juju-958f87-13-lxd-11@CEPH

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Hi Xav

I've tried to reproduce this but I thinking I'm missing some key information about the history of the unit(s)? If possible, please can you indicate the initial install (Ubuntu, OpenStack releases) and the upgrades that have happened since then? (Wondering if it was trusty to begin with?).

Thanks.

Revision history for this message
Peter Matulis (petermatulis) wrote :

Please re-add the deploy guide to this bug if, and when, a software issue has been identified.

no longer affects: charm-deployment-guide
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Setting to incomplete as we don't know how to reproduce this issue.

Changed in charm-cinder:
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack cinder charm because there has been no activity for 60 days.]

Changed in charm-cinder:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.