during the upgrade from Kilo to Mitaka through Liberty live-migration for old instances does not work

Bug #1624341 reported by Ilya Kharin on 2016-09-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Critical
Ilya Kharin

Bug Description

We have an issue during the upgrade of entire environment from Kilo to Mitaka with combined scenario (Mitaka controllers with Liberty upgrade levels + Liberty compute nodes). Live-migration respond with 500 for old instances created in Kilo release.

Steps to reproduce:
 1. 7.0 environment:
     3xController nodes
     3xCeph-OSD nodes
     2xCompute nodes
 2. Upgrade the Fuel Master node to 8.0 and then to 9.0
 3. Upgrade Controllers:
     octane upgrade-env
     octane upgrade-node --isolated
     octane upgrade-db
     octane upgrade-ceph
     octane upgrade-control
 4. Upgrade one Compute node:
     octane upgrade-node

Expected results:
 All instances are live-migrated for the compute node, node is re-provisioned and re-deployed in Mitaka

Actual results:
 Live migration of the first instances fails with the next error:
    root@node-5:~# nova live-migration 90156720-1b78-4eb8-8445-7eddda59fb80 node-1.test.domain.local
    ERROR (ClientException): Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
    <class 'MigrationError_Remote'> (HTTP 500) (Request-ID: req-ea641df6-f956-4f15-9af9-ca65105669cf)
 Action list for the instance:
   root@node-5:~# nova instance-action 90156720-1b78-4eb8-8445-7eddda59fb80 req-d5970b1a-02b5-4f3d-a292-00fb18d58658
    +---------------+----------------------------------------------------------------------------------------------------------------------+
    | Property | Value |
    +---------------+----------------------------------------------------------------------------------------------------------------------+
    | action | live-migration |
    | events | [{u'event': u'compute_check_can_live_migrate_source', |
    | | u'finish_time': u'2016-09-15T19:44:58.000000', |
    | | u'result': u'Error', |
    | | u'start_time': u'2016-09-15T19:42:58.000000', |
    | | u'traceback': u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 400, in decorated_function |
    | | return function(self, context, *args, **kwargs) |
    | | File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 378, in decorated_function |
    | | kwargs[\'instance\'], e, sys.exc_info()) |
    | | File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__ |
    | | six.reraise(self.type_, self.value, self.tb) |
    | | File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 366, in decorated_function |
    | | return function(self, context, *args, **kwargs) |
    | | File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5079, in check_can_live_migrate_source |
    | | ctxt, instance, refresh_conn_info=True) |
    | | File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1834, in _get_instance_block_device_info |
    | | context, instance, self.volume_api, self.driver) |
    | | File "/usr/lib/python2.7/dist-packages/nova/virt/block_device.py", line 502, in refresh_conn_infos |
    | | block_device_mapping) |
    | | File "/usr/lib/python2.7/dist-packages/nova/virt/block_device.py", line 52, in wrapped |
    | | ret_val = method(obj, context, *args, **kwargs) |
    | | File "/usr/lib/python2.7/dist-packages/nova/virt/block_device.py", line 315, in refresh_connection_info |
    | | connector) |
    | | File "/usr/lib/python2.7/dist-packages/nova/volume/cinder.py", line 225, in wrapper |
    | | six.reraise(exc_value, None, exc_trace) |
    | | File "/usr/lib/python2.7/dist-packages/nova/volume/cinder.py", line 214, in wrapper |
    | | res = method(self, ctx, volume_id, *args, **kwargs) |
    | | File "/usr/lib/python2.7/dist-packages/nova/volume/cinder.py", line 370, in initialize_connection |
    | | exc.code if hasattr(exc, \'code\') else None)}) |
    | | File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__ |
    | | six.reraise(self.type_, self.value, self.tb) |
    | | File "/usr/lib/python2.7/dist-packages/nova/volume/cinder.py", line 344, in initialize_connection |
    | | context).volumes.initialize_connection(volume_id, connector) |
    | | File "/usr/lib/python2.7/dist-packages/cinderclient/v1/volumes.py", line 333, in initialize_connection |
    | | {\'connector\': connector})[1][\'connection_info\'] |
    | | File "/usr/lib/python2.7/dist-packages/cinderclient/v1/volumes.py", line 259, in _action |
    | | return self.api.client.post(url, body=body) |
    | | File "/usr/lib/python2.7/dist-packages/cinderclient/client.py", line 104, in post |
    | | return self._cs_request(url, \'POST\', **kwargs) |
    | | File "/usr/lib/python2.7/dist-packages/cinderclient/client.py", line 98, in _cs_request |
    | | return self.request(url, method, **kwargs) |
    | | File "/usr/lib/python2.7/dist-packages/cinderclient/client.py", line 91, in request |
    | | raise exceptions.from_response(resp, body) |
    | | '}, |
    | | {u'event': u'compute_check_can_live_migrate_destination', |
    | | u'finish_time': u'2016-09-15T19:43:58.000000', |
    | | u'result': u'Error', |
    | | u'start_time': u'2016-09-15T19:42:57.000000', | | | u'traceback': u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 400, in decorated_function |
    | | return function(self, context, *args, **kwargs) |
    | | File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 378, in decorated_function |
    | | kwargs[\'instance\'], e, sys.exc_info()) |
    | | File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__ |
    | | six.reraise(self.type_, self.value, self.tb) |
    | | File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 366, in decorated_function |
    | | return function(self, context, *args, **kwargs) |
    | | File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5053, in check_can_live_migrate_destination |
    | | dest_check_data) |
    | | File "/usr/lib/python2.7/dist-packages/nova/compute/rpcapi.py", line 395, in check_can_live_migrate_source |
    | | dest_check_data=dest_check_data) |
    | | File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call |
    | | retry=self.retry) |
    | | File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send |
    | | timeout=timeout, retry=retry) |
    | | File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 533, in send |
    | | retry=retry) |
    | | File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 522, in _send |
    | | result = self._waiter.wait(msg_id, timeout) |
    | | File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 397, in wait |
    | | message = self.waiters.get(msg_id, timeout=timeout) |
    | | File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 296, in get |
    | | \'to message ID %s\' % msg_id) |
    | | '}] |
    | instance_uuid | 90156720-1b78-4eb8-8445-7eddda59fb80 |
    | message | Error |
    | project_id | c9e7f2d26f0c4e7ba3c9f8aa573114cc |
    | request_id | req-d5970b1a-02b5-4f3d-a292-00fb18d58658 |
    | start_time | 2016-09-15T19:42:56.000000 |
    | user_id | caf71692b020490e86f701d8f5656e5b |
    +---------------+----------------------------------------------------------------------------------------------------------------------+

   root@node-5:~# cinder service-list
   +------------------+--------------------------+------+---------+-------+----------------------------+-----------------+
   | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
   +------------------+--------------------------+------+---------+-------+----------------------------+-----------------+
   | cinder-backup | node-2.test.domain.local | nova | enabled | up | 2016-09-16T11:18:50.000000 | - |
   | cinder-backup | node-4.test.domain.local | nova | enabled | up | 2016-09-16T11:18:49.000000 | - |
   | cinder-backup | node-5.test.domain.local | nova | enabled | up | 2016-09-16T11:18:41.000000 | - |
   | cinder-backup | rbd:volumes | nova | enabled | down | 2016-09-15T18:50:10.000000 | - |
   | cinder-scheduler | node-2.test.domain.local | nova | enabled | up | 2016-09-16T11:18:42.000000 | - |
   | cinder-scheduler | node-4.test.domain.local | nova | enabled | up | 2016-09-16T11:18:49.000000 | - |
   | cinder-scheduler | node-5.test.domain.local | nova | enabled | up | 2016-09-16T11:18:48.000000 | - |
   | cinder-scheduler | rbd:volumes | nova | enabled | down | 2016-09-15T18:50:11.000000 | - |
   | cinder-volume | rbd:volumes | nova | enabled | down | 2016-09-15T18:50:11.000000 | - |
   | cinder-volume | rbd:volumes@RBD-backend | nova | enabled | up | 2016-09-16T11:18:49.000000 | - |
   +------------------+--------------------------+------+---------+-------+----------------------------+-----------------+

    ./node-5.test.domain.local/cinder-volume.log:2016-09-15T19:31:51.236319+00:00 debug: 2016-09-15 19:31:51.235461 7fc3e46a3740 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15T19:31:51.238625+00:00 err: 2016-09-15 19:31:51.236 2397 ERROR cinder.volume.drivers.rbd [req-9cef4ea8-866b-456d-9bf7-8a1df1c37372 - - - - -] Error connecting to ceph cluster.
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:31:51.236 2397 ERROR cinder.volume.drivers.rbd Traceback (most recent call last):
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:31:51.236 2397 ERROR cinder.volume.drivers.rbd File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 338, in _connect_to_rados
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:31:51.236 2397 ERROR cinder.volume.drivers.rbd client.connect()
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:31:51.236 2397 ERROR cinder.volume.drivers.rbd File "/usr/lib/python2.7/dist-packages/rados.py", line 429, in connect
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:31:51.236 2397 ERROR cinder.volume.drivers.rbd raise make_ex(ret, "error connecting to the cluster")
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:31:51.236 2397 ERROR cinder.volume.drivers.rbd ObjectNotFound: error connecting to the cluster
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:31:51.236 2397 ERROR cinder.volume.drivers.rbd~
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15T19:32:16.318801+00:00 debug: 2016-09-15 19:32:16.317900 7fc3e46a3740 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15T19:32:16.329091+00:00 err: 2016-09-15 19:32:16.325 2397 ERROR cinder.volume.drivers.rbd [req-9cef4ea8-866b-456d-9bf7-8a1df1c37372 - - - - -] Error connecting to ceph cluster.
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.325 2397 ERROR cinder.volume.drivers.rbd Traceback (most recent call last):
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.325 2397 ERROR cinder.volume.drivers.rbd File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 338, in _connect_to_rados
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.325 2397 ERROR cinder.volume.drivers.rbd client.connect()
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.325 2397 ERROR cinder.volume.drivers.rbd File "/usr/lib/python2.7/dist-packages/rados.py", line 429, in connect
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.325 2397 ERROR cinder.volume.drivers.rbd raise make_ex(ret, "error connecting to the cluster")
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.325 2397 ERROR cinder.volume.drivers.rbd ObjectNotFound: error connecting to the cluster
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.325 2397 ERROR cinder.volume.drivers.rbd~
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15T19:32:16.378303+00:00 err: 2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager [req-9cef4ea8-866b-456d-9bf7-8a1df1c37372 - - - - -] Failed to initialize driver.
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager Traceback (most recent call last):
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 427, in init_host
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager self.driver.check_for_setup_error()
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 301, in check_for_setup_error
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager with RADOSClient(self):
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 248, in __init__
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager self.cluster, self.ioctx = driver._connect_to_rados(pool)
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 822, in _wrapper
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager return r.call(f, *args, **kwargs)
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/retrying.py", line 212, in call
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager raise attempt.get()
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/retrying.py", line 247, in get
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager six.reraise(self.value[0], self.value[1], self.value[2])
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/retrying.py", line 200, in call
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
    ./node-5.test.domain.local/cinder-volume.log:2016-09-15 19:32:16.345 2397 ERROR cinder.volume.manager File "/

Ilya Kharin (akscram) on 2016-09-16
Changed in fuel:
status: New → Confirmed
importance: Undecided → Critical
milestone: none → 9.1
assignee: nobody → Ilya Kharin (akscram)
Ilya Kharin (akscram) on 2016-09-16
description: updated
Ilya Kharin (akscram) on 2016-09-16
Changed in fuel:
status: Confirmed → In Progress
Ilya Kharin (akscram) wrote :

We have to perform update_host for old volumes using `cinder-manage update_host`.

Reviewed: https://review.openstack.org/372725
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=3c558aaeecce1c7b081261247218835ef69a08df
Submitter: Jenkins
Branch: stable/mitaka

commit 3c558aaeecce1c7b081261247218835ef69a08df
Author: Ilya Kharin <email address hidden>
Date: Mon Sep 19 22:27:58 2016 +0300

    Restart radosgw during upgrade-ceph

    The radosgw have to be restarted during the upgrade of Ceph otherwise
    the object store will be unavailable on the primary controller.

    Change-Id: If949b24a6176c3c65ff4c26b089b4997bd991cb5
    Related-Bug: #1624341

tags: added: in-stable-mitaka

Reviewed: https://review.openstack.org/372724
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=4b2114f83570ee3df2644ed56523509e57b046df
Submitter: Jenkins
Branch: master

commit 4b2114f83570ee3df2644ed56523509e57b046df
Author: Ilya Kharin <email address hidden>
Date: Mon Sep 19 22:27:58 2016 +0300

    Restart radosgw during upgrade-ceph

    The radosgw have to be restarted during the upgrade of Ceph otherwise
    the object store will be unavailable on the primary controller.

    Change-Id: If949b24a6176c3c65ff4c26b089b4997bd991cb5
    Related-Bug: #1624341

Ilya Kharin (akscram) on 2016-09-20
tags: added: team-upgrades

Reviewed: https://review.openstack.org/371936
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=27076fe832ee976a6427f0e4945f570f7566d309
Submitter: Jenkins
Branch: stable/mitaka

commit 27076fe832ee976a6427f0e4945f570f7566d309
Author: Ilya Kharin <email address hidden>
Date: Sat Sep 17 17:04:45 2016 +0300

    Perform cinder-manage volume update_host

    The backend name for volumes was changed since Kilo. To make these
    volumes back manageable in Mitaka their hostnames have to be changed.
    The `cinder-manage volume update_host` command is suitable to do that.

    Change-Id: Ieeadd81c714186d58a3e3dfe5fd1223c40d16996
    Partial-Bug: #1624341

Reviewed: https://review.openstack.org/371935
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=33a8b5abb2a7b929c33e9dda535aa5d80c5f6f8a
Submitter: Jenkins
Branch: master

commit 33a8b5abb2a7b929c33e9dda535aa5d80c5f6f8a
Author: Ilya Kharin <email address hidden>
Date: Sat Sep 17 17:04:45 2016 +0300

    Perform cinder-manage volume update_host

    The backend name for volumes was changed since Kilo. To make these
    volumes back manageable in Mitaka their hostnames have to be changed.
    The `cinder-manage volume update_host` command is suitable to do that.

    Change-Id: Ieeadd81c714186d58a3e3dfe5fd1223c40d16996
    Partial-Bug: #1624341

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/373568

Reviewed: https://review.openstack.org/373560
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=b12ba87d91ce4212923202ec014d8a17e8f6126c
Submitter: Jenkins
Branch: stable/mitaka

commit b12ba87d91ce4212923202ec014d8a17e8f6126c
Author: Ilya Kharin <email address hidden>
Date: Wed Sep 21 03:10:28 2016 +0300

    Set upgrade level after each upgrade-node

    The upgrade_levels:compute parameter have to be set after each deploy
    changes operation on controllers otherwise live-migration of instances
    will not work properly.

    Change-Id: I98a89b51102371f7a50d8a182dc61d671dbf7151
    Partial-Bug: #1624341

Reviewed: https://review.openstack.org/373860
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=b71daaec85fcf6c391a6c5a4ed8e9adc24fc90ef
Submitter: Jenkins
Branch: master

commit b71daaec85fcf6c391a6c5a4ed8e9adc24fc90ef
Author: Ilya Kharin <email address hidden>
Date: Wed Sep 21 03:10:28 2016 +0300

    Set upgrade level after each upgrade-node

    The upgrade_levels:compute parameter have to be set after each deploy
    changes operation on controllers otherwise live-migration of instances
    will not work properly.

    Change-Id: I98a89b51102371f7a50d8a182dc61d671dbf7151
    Partial-Bug: #1624341

Reviewed: https://review.openstack.org/374216
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=eebdfd9e0256f0a8aa98e9144a0a62c0f66e7d00
Submitter: Jenkins
Branch: stable/mitaka

commit eebdfd9e0256f0a8aa98e9144a0a62c0f66e7d00
Author: Ilya Kharin <email address hidden>
Date: Wed Sep 21 17:43:00 2016 +0300

    Set forcefully upgrade_levels after deploy-changes

    Appropriate upgrade_levels have to be set after each deploy-changes for
    controller and compute nodes because they are cleared. It have to be
    solved using advanced configuration feature.

    Change-Id: Idbe13e58d2ae0787620211ba0300150ed4e82f39
    Partial-Bug: #1624341

Reviewed: https://review.openstack.org/374248
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=14be38cd7fe72c765eb74d45098ca84afd946b9e
Submitter: Jenkins
Branch: master

commit 14be38cd7fe72c765eb74d45098ca84afd946b9e
Author: Ilya Kharin <email address hidden>
Date: Wed Sep 21 17:43:00 2016 +0300

    Set forcefully upgrade_levels after deploy-changes

    Appropriate upgrade_levels have to be set after each deploy-changes for
    controller and compute nodes because they are cleared. It have to be
    solved using advanced configuration feature.

    Change-Id: Idbe13e58d2ae0787620211ba0300150ed4e82f39
    Partial-Bug: #1624341

Ilya Kharin (akscram) on 2016-09-21
Changed in fuel:
status: In Progress → Fix Committed

Change abandoned by Ilya Kharin (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/373568
Reason: It was a mistake, it's not needed.

Vladimir Khlyunev (vkhlyunev) wrote :

315 - verified

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers