Block migration fails because destination compute node refuses ssh connection

Bug #1406167 reported by Yang Luo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
jichenjc

Bug Description

Summary:
When block-migrating a vm between two compute nodes, if the destination node lacks ssh daemon or refuses ssh connection, the block-migration would fail and cause damage to the vm, vm will be in error state for a certain time.

Scenario:
Block-migrating a vm between two compute nodes

Example:
Compute Node 1: ly-compute1 (10.0.0.31)
Compute Node 2: ly-compute2 (10.0.0.32) with VM: test11

The below tutorial is what I followed for openstack installation, it doesn't say about installing ssh support on compute nodes, so I didn't install ssh on both compute nodes.
OpenStack Installation Guide for Ubuntu 12.04/14.04 (LTS)
http://docs.openstack.org/icehouse/install-guide/install/apt/content/

Error occurs when I use the command "nova migrate --poll test11" trying to migrate vm "test11" from ly-compute2 to ly-compute1.
both error message and log said that ssh connecting to 10.0.0.31 failed (because ssh daemon is NOT even installed.) And the vm "test11" will enter a vm_state error state.
*********************************************************************
Tring to migrate:
C:\Windows\system32>nova migrate --poll test11

Server migrating... 0% complete
Error migrating server
ERROR (InstanceInErrorState): Unexpected error while running command.
Command: ssh 10.0.0.31 mkdir -p /var/lib/nova/instances/30c4dac1-f3bc-4e6a-8a38-ee49671eee6a
Exit code: 255
Stdout: u''
Stderr: u'ssh: connect to host 10.0.0.31 port 22: Connection refused\r\n'
*********************************************************************
dashboard log message:
Unexpected error while running command. Command: ssh 10.0.0.31 mkdir -p /var/lib/nova/instances/30c4dac1-f3bc-4e6a-8a38-ee49671eee6a Exit code: 255 Stdout: u'' Stderr: u'ssh: connect to host 10.0.0.31 port 22: Connection refused\r\n'
Code
500
Details
File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 290, in decorated_function return function(self, context, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3472, in resize_instance block_device_info)
File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4954, in migrate_disk_and_power_off utils.execute('ssh', dest, 'mkdir', '-p', inst_base)
File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 165, in execute return processutils.execute(*cmd, **kwargs)
File "/usr/lib/python2.7/dist-packages/nova/openstack/common/processutils.py", line 195, in execute cmd=sanitized_cmd)
*********************************************************************
Second time tring to migrate:
C:\Windows\system32>nova migrate --poll test11

ERROR (Conflict): Cannot 'migrate' while instance is in vm_state error (HTTP 409) (Request-ID: req-ba3ca8e1-0753-40ac-9e2e-2f7c319ec691)
*********************************************************************

Request:
This fault can be dangerous, because it will cause damage to the vm of a user. The migrate_disk_and_power_off function from nova/virt/libvirt/driver.py should pre-check if ssh daemon is running on dest node, before the actual block-migration process.

Yang Luo (hsluoyz)
description: updated
Changed in nova:
status: New → Confirmed
tags: added: documentation
removed: nova-manage
Revision history for this message
jichenjc (jichenjc) wrote :

yes, I would think it make sense
if we do know it will raise exception, we should check it before execution

Changed in nova:
assignee: nobody → jichenjc (jichenjc)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/154696

Changed in nova:
status: Confirmed → In Progress
Changed in nova:
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/154696
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a7bd611eb293e24c5518f97e18701930c8f1ae61
Submitter: Jenkins
Branch: master

commit a7bd611eb293e24c5518f97e18701930c8f1ae61
Author: jichenjc <email address hidden>
Date: Sun Feb 8 02:01:05 2015 +0800

    Keep instance state if ssh failed during migration

    When block-migrating a vm between two compute nodes,
    if the destination node lacks ssh daemon or refuses
    ssh connection, the block-migration would fail and
    cause damage to the vm, vm will be in error state
    for a certain time.

    Actually, the VM itself is not changed, the resize
    action failed at preparation stage, so don't need to
    set it to 'ERROR' state.

    Change-Id: I9907fcd9b235b6a9697416b93f561d831eebee6c
    Closes-Bug: #1406167

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.