Undeletable temporary volume object is left in DB if volume migration fails

Bug #1403916 reported by Yuriy Taraday
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
Mitsuhiro Tanino
Juno
Fix Released
High
Mitsuhiro Tanino

Bug Description

When some error happens during volume migration (e.g. https://bugs.launchpad.net/cinder/+bug/1403912 during volume copying), cinder-volume deletes LVM volume but doesn't delete volume from DB thus leaving behind volume that cannot be deleted (API returns 400 error with no explanation in logs)

Steps to reproduce:
Having https://bugs.launchpad.net/cinder/+bug/1403912 actual,
$ cinder create --name vol1 1
$ cinder migrate vol1 somehost@lvmdriver-1#lvm-driver
Verify that it failed as described in https://bugs.launchpad.net/cinder/+bug/1403912

Expected result:
No new volumes are left in db or they can be deleted.

Actual result:
Temporary volume is left in DB, cannot be deleted through API.
Real shell log: http://paste.openstack.org/show/xhGXBRS8tltk13jcdqgY/

Revision history for this message
krishna (leburu-reddy) wrote :

The status of the volumes in the database is changed from available to deleted ,when the migration failed volume is deleted.
The id is present in the database but in deleted state.

Revision history for this message
krishna (leburu-reddy) wrote :

In juno version of the openstack the status in the database is getting updated to deleted state after trying to migrate the volume

Changed in cinder:
assignee: nobody → krishna (leburu-reddy)
krishna (leburu-reddy)
Changed in cinder:
assignee: krishna (leburu-reddy) → nobody
Revision history for this message
Avishay Traeger (avishay-il) wrote :

What is the status of this bug? Is the DB entry changed to 'deleted' as expected, or not?

Jay Bryant (jsbryant)
Changed in cinder:
importance: Undecided → High
Jay Bryant (jsbryant)
tags: added: lvm migration
Revision history for this message
Mitsuhiro Tanino (mitsuhiro-tanino) wrote :

I tried to reproduce this problem and I think this is still happen.

http://paste.openstack.org/show/170019/

After applying commit https://review.openstack.org/#/c/148373/ (this is not merged yet), I saw
the root cause of this deletion error. The os-vol-mig-status-attr:migstat flag is remaining in the
new volume, therefore this volume can't be deleted.
This flag should be removed when volume migration is failed.

$ cinder delete 643b8e20-c378-46ce-9d0e-899a27f14060
Delete for volume 643b8e20-c378-46ce-9d0e-899a27f14060 failed: Invalid volume: Volume cannot be deleted while migrating (HTTP 400) (Request-ID: req-a221ced0-6e43-4176-9156-7faa2dbe43e1)
ERROR: Unable to delete any of the specified volumes.

>>Expected result:
>>No new volumes are left in db or they can be deleted.

I will propose a solution for the latter case.

Changed in cinder:
assignee: nobody → Mitsuhiro Tanino (mitsuhiro-tanino)
Revision history for this message
Jay Bryant (jsbryant) wrote :

Mitsuhiro,

Thanks for the update and the analysis! Look forward to seeing the patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/154739

Changed in cinder:
status: New → In Progress
Revision history for this message
Mitsuhiro Tanino (mitsuhiro-tanino) wrote :

Hi Avishay,

Here are full logs from my reproduction.
To reproduce simulated error, I stopped tgtd daemon on the source host before volume migration.

<<<c-vol log of source host>>>
http://paste.openstack.org/show/171470/

<<<c-vol log of destination host>>>
http://paste.openstack.org/show/171471/

<<<Command and DB logs>>>
http://paste.openstack.org/show/171477/

After volume migration, source LV is still remaining on the source host.
A destination LV was created temporarily but it was deleted during error handling of _migrate_volume_generic().
However, as for DB entry of destination volume, deleted flag was set to 0 and migration_status was set to "target:e49a91f7-a791-40ee-9378-20fbad0ea400". Therefore, we can see the destination volume in the result of "cinder list", but we can't delete this volume due to the migration status.
Also as for DB entry of source volume, migration_status was set to "migrating", therefore this volume also can't be deleted anymore.

-----
$ cinder force-delete e49a91f7-a791-40ee-9378-20fbad0ea400
Delete for volume e49a91f7-a791-40ee-9378-20fbad0ea400 failed: Invalid volume: Volume cannot be deleted while migrating (HTTP 400) (Request-ID: req-37fc6aba-e45b-4506-bde5-aad104fbb2bd)
ERROR: Unable to force delete any of the specified volumes.

$ cinder force-delete 48b8793e-c818-4829-af9f-d26d9afc6340
Delete for volume 48b8793e-c818-4829-af9f-d26d9afc6340 failed: Invalid volume: Volume cannot be deleted while migrating (HTTP 400) (Request-ID: req-2ad29a16-29b9-4e07-b48f-bf4f947b462b)
ERROR: Unable to force delete any of the specified volumes.

Revision history for this message
Avishay Traeger (avishay-il) wrote :

OK I see the bug in delete_volume(), in manager.py. The problem is in this code:

        # If deleting the source volume in a migration, we want to skip quotas
        # and other database updates.
        if volume_ref['migration_status']:
            return True

Note that the comment and the code don't match - the check in the code will be true for both the source and destination.

This is what the logic should be:
1. If volume['migration_status'] == 'migrating' (i.e., source volume) skip over the remainder of the function. However, we should probably do it more explicitly than "return True" - i.e., we should do "if volume_ref['migration_status'] != 'migrating'"
2. If volume['migration_status'].startswith('target') , we should skip over the quota update code, but still we should call these lines:
        self.db.volume_glance_metadata_delete_by_volume(context, volume_id)
        self.db.volume_destroy(context, volume_id)

Hope that makes sense.

Revision history for this message
Mitsuhiro Tanino (mitsuhiro-tanino) wrote :

Hi Avishay,

Thank you for letting me know your investigation.
I agree the delete_volume have to handle both source and destination volume properly.

I think using "if volume_ref['migration_status'] != 'migrating'" has a little bit problem
since source volume deletion calls delete_volume with 'completing' status.
In this case, I think we also want to skip quota and other database updates in delete_volume.

I posted new patch. Please continue to discuss based on the new patch.

Regards,
Mitsuhiro Tanino

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/154739
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=d96c40c0ab85090a83cb532090cc879c27d630d5
Submitter: Jenkins
Branch: master

commit d96c40c0ab85090a83cb532090cc879c27d630d5
Author: Mitsuhiro Tanino <email address hidden>
Date: Mon Feb 9 17:18:31 2015 -0500

    Clear migration_status from a destination volume if migration fails

    When some error happens during volume migration, cinder-volume
    keeps remaining a destination volume because the volume copy may be
    in the completing phase and a source volume may be deleted already.

    In this case, the migration_status of the destination volume in
    data base should be cleared. As a result, the volume can be handled
    properly even if volume migration is failed. If the migration is
    in the migrating phase, the volume should be deleted and an entry
    of database is also destroyed because the source volume is remaining.

    Also there is an another problem that a source volume is remaining
    as migrating status if create_export raises exception. In this case,
    source volume can't be deleted even if it will be unnecessary.

    This change fixes these two problems.

    Closes-Bug: #1403916
    Change-Id: Ie4199bc1e804e5cbbb793f448a99487f9823f1a9

Changed in cinder:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/161328

Thierry Carrez (ttx)
Changed in cinder:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/juno)

Reviewed: https://review.openstack.org/161328
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=8940579083bf5b572f88bce7d64a848ed823f950
Submitter: Jenkins
Branch: stable/juno

commit 8940579083bf5b572f88bce7d64a848ed823f950
Author: Mitsuhiro Tanino <email address hidden>
Date: Mon Feb 9 17:18:31 2015 -0500

    Clear migration_status from a destination volume if migration fails

    When some error happens during volume migration, cinder-volume
    keeps remaining a destination volume because the volume copy may be
    in the completing phase and a source volume may be deleted already.

    In this case, the migration_status of the destination volume in
    data base should be cleared. As a result, the volume can be handled
    properly even if volume migration is failed. If the migration is
    in the migrating phase, the volume should be deleted and an entry
    of database is also destroyed because the source volume is remaining.

    Also there is an another problem that a source volume is remaining
    as migrating status if create_export raises exception. In this case,
    source volume can't be deleted even if it will be unnecessary.

    This change fixes these two problems.

    NOTE:
    In order to backport test cases, we need to use contextlib.nested
    instead of using multiple variables in "with" statement because
    it is not supported at python2.6.

    Closes-Bug: #1403916
    Change-Id: Ie4199bc1e804e5cbbb793f448a99487f9823f1a9
    (cherry picked from commit d96c40c0ab85090a83cb532090cc879c27d630d5)

tags: added: in-stable-juno
Thierry Carrez (ttx)
Changed in cinder:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.