Block migration fails because destination compute node refuses ssh connection

Bug #1406167 reported by Yang Luo on 2014-12-29
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
jichenjc

Bug Description

Summary:
When block-migrating a vm between two compute nodes, if the destination node lacks ssh daemon or refuses ssh connection, the block-migration would fail and cause damage to the vm, vm will be in error state for a certain time.

Scenario:
Block-migrating a vm between two compute nodes

Example:
Compute Node 1: ly-compute1 (10.0.0.31)
Compute Node 2: ly-compute2 (10.0.0.32) with VM: test11

The below tutorial is what I followed for openstack installation, it doesn't say about installing ssh support on compute nodes, so I didn't install ssh on both compute nodes.
OpenStack Installation Guide for Ubuntu 12.04/14.04 (LTS)
http://docs.openstack.org/icehouse/install-guide/install/apt/content/

Error occurs when I use the command "nova migrate --poll test11" trying to migrate vm "test11" from ly-compute2 to ly-compute1.
both error message and log said that ssh connecting to 10.0.0.31 failed (because ssh daemon is NOT even installed.) And the vm "test11" will enter a vm_state error state.
*********************************************************************
Tring to migrate:
C:\Windows\system32>nova migrate --poll test11

Server migrating... 0% complete
Error migrating server
ERROR (InstanceInErrorState): Unexpected error while running command.
Command: ssh 10.0.0.31 mkdir -p /var/lib/nova/instances/30c4dac1-f3bc-4e6a-8a38-ee49671eee6a
Exit code: 255
Stdout: u''
Stderr: u'ssh: connect to host 10.0.0.31 port 22: Connection refused\r\n'
*********************************************************************
dashboard log message:
Unexpected error while running command. Command: ssh 10.0.0.31 mkdir -p /var/lib/nova/instances/30c4dac1-f3bc-4e6a-8a38-ee49671eee6a Exit code: 255 Stdout: u'' Stderr: u'ssh: connect to host 10.0.0.31 port 22: Connection refused\r\n'
Code
500
Details
File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 290, in decorated_function return function(self, context, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3472, in resize_instance block_device_info)
File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4954, in migrate_disk_and_power_off utils.execute('ssh', dest, 'mkdir', '-p', inst_base)
File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 165, in execute return processutils.execute(*cmd, **kwargs)
File "/usr/lib/python2.7/dist-packages/nova/openstack/common/processutils.py", line 195, in execute cmd=sanitized_cmd)
*********************************************************************
Second time tring to migrate:
C:\Windows\system32>nova migrate --poll test11

ERROR (Conflict): Cannot 'migrate' while instance is in vm_state error (HTTP 409) (Request-ID: req-ba3ca8e1-0753-40ac-9e2e-2f7c319ec691)
*********************************************************************

Request:
This fault can be dangerous, because it will cause damage to the vm of a user. The migrate_disk_and_power_off function from nova/virt/libvirt/driver.py should pre-check if ssh daemon is running on dest node, before the actual block-migration process.

Yang Luo (hsluoyz) on 2015-01-01
description: updated
Changed in nova:
status: New → Confirmed
tags: added: documentation
removed: nova-manage
jichenjc (jichenjc) wrote :

yes, I would think it make sense
if we do know it will raise exception, we should check it before execution

Changed in nova:
assignee: nobody → jichenjc (jichenjc)

Fix proposed to branch: master
Review: https://review.openstack.org/154696

Changed in nova:
status: Confirmed → In Progress
Changed in nova:
importance: Undecided → Low

Reviewed: https://review.openstack.org/154696
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a7bd611eb293e24c5518f97e18701930c8f1ae61
Submitter: Jenkins
Branch: master

commit a7bd611eb293e24c5518f97e18701930c8f1ae61
Author: jichenjc <email address hidden>
Date: Sun Feb 8 02:01:05 2015 +0800

    Keep instance state if ssh failed during migration

    When block-migrating a vm between two compute nodes,
    if the destination node lacks ssh daemon or refuses
    ssh connection, the block-migration would fail and
    cause damage to the vm, vm will be in error state
    for a certain time.

    Actually, the VM itself is not changed, the resize
    action failed at preparation stage, so don't need to
    set it to 'ERROR' state.

    Change-Id: I9907fcd9b235b6a9697416b93f561d831eebee6c
    Closes-Bug: #1406167

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2015-03-20
Changed in nova:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2015-04-30
Changed in nova:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers