Rebuild an instance with attached volume fails

Bug #1440762 reported by Roman Podoliaka
38
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
melanie witt
Juno
Fix Released
High
Matt Riedemann
Kilo
Fix Released
High
Matt Riedemann

Bug Description

When trying to rebuild an instance with attached volume, it fails with the errors:

2015-02-04 08:41:27.477 22000 TRACE oslo.messaging.rpc.dispatcher libvirtError: Failed to terminate process 22913 with SIGKILL: Device or resource busy
2015-02-04 08:41:27.477 22000 TRACE oslo.messaging.rpc.dispatcher
<180>Feb 4 08:43:12 node-2 nova-compute Periodic task is updating the host stats, it is trying to get disk info for instance-00000003, but the backing volume block device was removed by concurrent operations such as resize. Error: No volume Block Device Mapping at path: /dev/disk/by-path/ip-192.168.0.4:3260-iscsi-iqn.2010-10.org.openstack:volume-82ba5653-3e07-4f0f-b44d-a946f4dedde9-lun-1
<182>Feb 4 08:43:13 node-2 nova-compute VM Stopped (Lifecycle Event)

The full log of rebuild process is here: http://paste.openstack.org/show/166892/

Changed in nova:
assignee: nobody → Roman Podoliaka (rpodolyaka)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/170907

Changed in nova:
status: New → In Progress
Jay Pipes (jaypipes)
Changed in nova:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/170907
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=337471bc71cbdabe6b492379c81470abac8040d2
Submitter: Jenkins
Branch: master

commit 337471bc71cbdabe6b492379c81470abac8040d2
Author: Roman Podoliaka <email address hidden>
Date: Mon Apr 6 17:53:51 2015 +0300

    Fix rebuild of an instance with a volume attached

    When detaching block devices on rebuild we only notify Cinder it's
    safe to detach a volume, but don't actually tell the driver to do
    that first.

    Closes-Bug: #1440762

    Change-Id: I017bf749f426717dc76cf99a387102848fb1c541

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Oops, apparantely the fix ^ wasn't enough. We'll debug the traceback and upload a new patch soon

Changed in nova:
status: Fix Committed → In Progress
Revision history for this message
Scott DAngelo (scott-dangelo) wrote :

Please note that a previously filed bug has been closed as a duplicate of this one, and there may be some valuable information there that might help:
https://bugs.launchpad.net/nova/+bug/1423690

Revision history for this message
David McNally (dave-mcnally) wrote :

I have re-opened the change that I made to fix https://bugs.launchpad.net/nova/+bug/1423690 as I believe it may still be needed.

Revision history for this message
David McNally (dave-mcnally) wrote :
Changed in nova:
assignee: Roman Podoliaka (rpodolyaka) → David McNally (dave-mcnally)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/176891

Changed in nova:
assignee: David McNally (dave-mcnally) → Roman Podoliaka (rpodolyaka)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/176892

Revision history for this message
Abhijeet Malawade (abhijeet-malawade) wrote :

When I try to evacuate instance booted from volume I am getting below error on compute node: (Evacuate uses rebuild api)

http://paste.openstack.org/show/281945

Cinder volume driver: lvm driver

Note: I am bale to evacuate volume backed instance successfully after applying patch : https://review.openstack.org/176891

Revision history for this message
David (david-alfano) wrote :

This bug has has also been impacting openstack-ansible. We have been tracking it here: https://bugs.launchpad.net/openstack-ansible/+bug/1400881

Changed in nova:
assignee: Roman Podoliaka (rpodolyaka) → melanie witt (melwitt)
Matt Riedemann (mriedem)
tags: added: juno-backport-potential kilo-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/203236

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/203253

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/176891
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=25f15b0bc3bd1971fd29062a7a001f8007485636
Submitter: Jenkins
Branch: master

commit 25f15b0bc3bd1971fd29062a7a001f8007485636
Author: Roman Podoliaka <email address hidden>
Date: Tue Apr 21 14:41:28 2015 +0300

    rebuild: fix rebuild of server with volume attached

    This was meant to be fixed by I017bf749f426717dc76cf99a387102848fb1c541 ,
    but it didn't take into account that BDM entry was destroyed, which
    caused the rebuild to fail when spawning the instance.

    Add a new parameter to detach_volume() to bypass destroying of BDM,
    as we just want to detach a volume first and then re-attach it again.

    A Tempest test is added in I50557c69b54003d3409c8e977966f5332f4fe690
    to make sure this is actually tested in the gate.

    Closes-Bug: #1440762

    Co-Authored-By: melanie witt <email address hidden>

    Change-Id: I9134fbf5ce72c32cca91de90001c09e00b4e19e8

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-2
status: Fix Committed → Fix Released
no longer affects: openstack-ansible
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/kilo)

Reviewed: https://review.openstack.org/203236
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=eb3b1c80c1a8c2bcaf3df21663eba54bae0cacb7
Submitter: Jenkins
Branch: stable/kilo

commit eb3b1c80c1a8c2bcaf3df21663eba54bae0cacb7
Author: Roman Podoliaka <email address hidden>
Date: Mon Apr 6 17:53:51 2015 +0300

    Fix rebuild of an instance with a volume attached

    When detaching block devices on rebuild we only notify Cinder it's
    safe to detach a volume, but don't actually tell the driver to do
    that first.

    Closes-Bug: #1440762

    Change-Id: I017bf749f426717dc76cf99a387102848fb1c541
    (cherry picked from commit 337471bc71cbdabe6b492379c81470abac8040d2)

    ---------------------------------------------------------------------
    squashed with another change that fixes a bug introduced in the first
    ---------------------------------------------------------------------

    rebuild: fix rebuild of server with volume attached

    This was meant to be fixed by I017bf749f426717dc76cf99a387102848fb1c541 ,
    but it didn't take into account that BDM entry was destroyed, which
    caused the rebuild to fail when spawning the instance.

    Add a new parameter to detach_volume() to bypass destroying of BDM,
    as we just want to detach a volume first and then re-attach it again.

    A Tempest test is added in I50557c69b54003d3409c8e977966f5332f4fe690
    to make sure this is actually tested in the gate.

    Closes-Bug: #1440762

    Co-Authored-By: melanie witt <email address hidden>

    Conflicts:
            nova/compute/manager.py

    NOTE(mriedem): In Kilo the detach_volume method was still using the
    @object_compat decorator.

    Change-Id: I9134fbf5ce72c32cca91de90001c09e00b4e19e8
    (cherry picked from commit 25f15b0bc3bd1971fd29062a7a001f8007485636)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by dave-mcnally (<email address hidden>) on branch: master
Review: https://review.openstack.org/172951
Reason: I've been away a while, looks like the bug this addresses is fixed now - abandoned.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/176892
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7fe20e96f2604d03ec6a7fa563485ffdcb5cc519
Submitter: Jenkins
Branch: master

commit 7fe20e96f2604d03ec6a7fa563485ffdcb5cc519
Author: Roman Podoliaka <email address hidden>
Date: Tue Apr 21 12:47:24 2015 +0300

    rebuild: make sure server is shut down before volumes are detached

    Currently, we detach block devices before an instance is shut down,
    which means all the data, which hasn't been fsynced yet, will possibly
    be lost.

    A tempest test is added in I1158719cb906309a29ea83460e7e35d753ad1081

    Closes-Bug: #1471216
    Related-Bug: #1440762

    Change-Id: I4846418c4dbdae5b1ac1c08e8b9ac8cea5cb2990

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/juno)

Reviewed: https://review.openstack.org/203253
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5cf20f377118963ad1e89d9fa889dcb454f030fd
Submitter: Jenkins
Branch: stable/juno

commit 5cf20f377118963ad1e89d9fa889dcb454f030fd
Author: Roman Podoliaka <email address hidden>
Date: Mon Apr 6 17:53:51 2015 +0300

    Fix rebuild of an instance with a volume attached

    When detaching block devices on rebuild we only notify Cinder it's
    safe to detach a volume, but don't actually tell the driver to do
    that first.

    Closes-Bug: #1440762

    Change-Id: I017bf749f426717dc76cf99a387102848fb1c541
    (cherry picked from commit 337471bc71cbdabe6b492379c81470abac8040d2)

    ---------------------------------------------------------------------
    squashed with another change that fixes a bug introduced in the first
    ---------------------------------------------------------------------

    rebuild: fix rebuild of server with volume attached

    This was meant to be fixed by I017bf749f426717dc76cf99a387102848fb1c541 ,
    but it didn't take into account that BDM entry was destroyed, which
    caused the rebuild to fail when spawning the instance.

    Add a new parameter to detach_volume() to bypass destroying of BDM,
    as we just want to detach a volume first and then re-attach it again.

    A Tempest test is added in I50557c69b54003d3409c8e977966f5332f4fe690
    to make sure this is actually tested in the gate.

    Closes-Bug: #1440762

    Co-Authored-By: melanie witt <email address hidden>

    Conflicts:
            nova/tests/unit/compute/test_compute.py
            nova/tests/unit/compute/test_compute_mgr.py

    NOTE(mriedem): The tests were moved under nova/tests/unit in Kilo.

    Change-Id: I9134fbf5ce72c32cca91de90001c09e00b4e19e8
    (cherry picked from commit 25f15b0bc3bd1971fd29062a7a001f8007485636)
    (cherry picked from commit eb3b1c80c1a8c2bcaf3df21663eba54bae0cacb7)

Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-2 → 12.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.