Nova fails to delete Instance having iscsi volume

Bug #1503676 reported by Eugene Nikanorov
38
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
MOS Nova
6.0.x
Won't Fix
High
MOS Maintenance
6.1.x
Won't Fix
High
MOS Maintenance
7.0.x
Won't Fix
High
MOS Maintenance
8.0.x
Fix Released
High
MOS Nova
9.x
Invalid
High
MOS Nova

Bug Description

Upstream bug: https://bugs.launchpad.net/nova/+bug/1436561

MOS 6.0

Nova fails to delete an instance that has iscsi volume attached.
Failure is caused by cinder-volume, which can't detach iscsi target.

Relevant log message in cinder-volume:
tgtadm: this target is still active

This is supposedly a result of nova-compute failing to unmount the volume from the instance.
Logs seen on nova-compute:

2015-10-05 12:27:43.158 12186 DEBUG nova.openstack.common.processutils [req-ca14ccb2-62b5-4602-b4f0-c99514bd5a64 None] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf systool -c fc_host -v execute /usr/lib/python2.6/site-packages/nova/openstack/common/processutils.py:161
2015-10-05 12:27:43.282 12186 DEBUG nova.openstack.common.processutils [req-ca14ccb2-62b5-4602-b4f0-c99514bd5a64 None] Result was 1 execute /usr/lib/python2.6/site-packages/nova/openstack/common/processutils.py:195
2015-10-05 12:27:43.283 12186 DEBUG nova.virt.libvirt.driver [req-ca14ccb2-62b5-4602-b4f0-c99514bd5a64 None] [instance: f99c547b-aea0-4dfc-9216-b74204c8866c] Could not determine fibre channel world wide node names get_volume_connector /usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py:1279
2015-10-05 12:27:43.284 12186 DEBUG nova.openstack.common.processutils [req-ca14ccb2-62b5-4602-b4f0-c99514bd5a64 None] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf systool -c fc_host -v execute /usr/lib/python2.6/site-packages/nova/openstack/common/processutils.py:161
2015-10-05 12:27:43.406 12186 DEBUG nova.openstack.common.processutils [req-ca14ccb2-62b5-4602-b4f0-c99514bd5a64 None] Result was 1 execute /usr/lib/python2.6/site-packages/nova/openstack/common/processutils.py:195
2015-10-05 12:27:43.406 12186 DEBUG nova.virt.libvirt.driver [req-ca14ccb2-62b5-4602-b4f0-c99514bd5a64 None] [instance: f99c547b-aea0-4dfc-9216-b74204c8866c] Could not determine fibre channel world wide port names get_volume_connector /usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py:1286
2015-10-05 12:27:43.406 12186 DEBUG nova.openstack.common.processutils [req-ca14ccb2-62b5-4602-b4f0-c99514bd5a64 None] Result was 1 execute /usr/lib/python2.6/site-packages/nova/openstack/common/processutils.py:195
2015-10-05 12:27:43.406 12186 DEBUG nova.virt.libvirt.driver [req-ca14ccb2-62b5-4602-b4f0-c99514bd5a64 None] [instance: f99c547b-aea0-4dfc-9216-b74204c8866c] Could not determine fibre channel world wide port names get_volume_connector /usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py:1286
2015-10-05 12:27:43.407 12186 DEBUG nova.volume.cinder [req-ca14ccb2-62b5-4602-b4f0-c99514bd5a64 None] Cinderclient connection created using URL: http://192.168.0.18:8776/v1/68e3744676c743faad8a657de9740e8e get_cinder_client_version /usr/lib/python2.6/site-packages/nova/volume/cinder.py:255
2015-10-05 12:27:44.707 12186 DEBUG nova.volume.cinder [req-ca14ccb2-62b5-4602-b4f0-c99514bd5a64 None] Cinderclient connection created using URL: http://192.168.0.18:8776/v1/68e3744676c743faad8a657de9740e8e get_cinder_client_version /usr/lib/python2.6/site-packages/nova/volume/cinder.py:255
2015-10-05 12:27:45.488 12186 DEBUG cinderclient.client [req-ca14ccb2-62b5-4602-b4f0-c99514bd5a64 ] Failed attempt(1 of 3), retrying in 1 seconds _cs_request /usr/lib/python2.6/site-packages/cinderclient/client.py:297
2015-10-05 12:27:48.447 12186 DEBUG cinderclient.client [req-ca14ccb2-62b5-4602-b4f0-c99514bd5a64 ] Failed attempt(2 of 3), retrying in 2 seconds _cs_request /usr/lib/python2.6/site-packages/cinderclient/client.py:297
2015-10-05 12:27:52.337 12186 DEBUG cinderclient.client [req-ca14ccb2-62b5-4602-b4f0-c99514bd5a64 ] Failed attempt(3 of 3), retrying in 4 seconds _cs_request /usr/lib/python2.6/site-packages/cinderclient/client.py:297
2015-10-05 12:27:57.144 12186 ERROR nova.compute.manager [req-ca14ccb2-62b5-4602-b4f0-c99514bd5a64 None] [instance: f99c547b-aea0-4dfc-9216-b74204c8866c] Setting instance vm_state to ERROR

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

cinder-volume log

description: updated
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

nova-compute log

Changed in mos:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → MOS Nova (mos-nova)
summary: - Nova fails to delete Instance hanving iscsi volume
+ Nova fails to delete Instance having iscsi volume
description: updated
Revision history for this message
Roman Rufanov (rrufanov) wrote :

customer found on 6.0

tags: added: support
Revision history for this message
Jay Pipes (jaypipes) wrote :

Looks like https://review.openstack.org/#/c/227851/ should fix this...

tags: added: mos-nova
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Fixed in Mitaka, backports proposed for Liberty and Kilo - https://review.openstack.org/#/q/I983f80822a5c210929f33e1aa348a0fef91e890b,n,z

tags: added: area-nova
removed: mos-nova
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Eugene, could you please provide steps to reproduce this bug?

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Ryan McNair <email address hidden>
Review: https://review.fuel-infra.org/16127

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

I have set bug Status for MOS 6.0, 6.1 and 7.0 to Incomplete since I couldn't reproduce it on our lab. Eugene, could you please provide me steps-to-reproduce for this issue?

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-8.0/liberty)

Reviewed: https://review.fuel-infra.org/16127
Submitter: Pkgs Jenkins <email address hidden>
Branch: openstack-ci/fuel-8.0/liberty

Commit: 5f4aed4eaa84258ca87b1954da67a4851c84b407
Author: Ryan McNair <email address hidden>
Date: Thu Jan 14 15:31:48 2016

Add retry logic for detaching device using LibVirt

Add retry logic for removing a disk device from the LibVirt
guest domain XML. This is needed because immediately after a guest
reboot, libvirtmod.virDomainDetachDeviceFlags() will silently fail
to remove the mapping from the guest domain. The async retry
behavior is done in Guest and is generic so it can be re-used by any other
detaches which hit this same race condition.

Change-Id: I983f80822a5c210929f33e1aa348a0fef91e890b
Closes-Bug: #1503676
(cherry picked from commit 3a3fb3cfb2c41ad182545e47649ff12a4f3a743e)

tags: added: nova
Revision history for this message
Evgeny Sikachev (esikachev) wrote :

verified on 478 iso

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Alexey, you can use this test (you will need to run the *whole* test_run_actions test suite) to reproduce this race condition.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Roman, thanks! I will check those steps next week.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Release notes:

Fixed a race condition when detaching a volume, which causes rebuild / delete operations to fail in corner cases for instances with volumes attached.

tags: added: release-notes
tags: added: 8.0 release-notes-done
removed: release-notes
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Roman: I have run tests against 7.0 environment and there were no failures.

(.venv) developer@nailgun:mos-tempest-runner$ run_tests tempest.api.compute.servers.test_server_actions.test_rebuild_server_with_volume_attached
Running Tempest tests
Tempest commit ID is c5bb7663b618a91b15d379fb5b2550e238566ce6
Tempest config file already exists!

....
OMITTED
....

Ran 0 tests in 2.837s

OK

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Moving bug back to incomplete state, since there are no steps-to-reproduce.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Alexey, you didn't actually run any tests:

Ran 0 tests in 2.837s

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Ok, I will try to re-run the tests and find out what is wrong. I have to point out that clear steps to reproduce are still missing here since I have followed exact instructions from your message and https://github.com/Mirantis/mos-tempest-runner and had no success.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

My bad: during previous tests I've typed incorrect command. Below you can see steps to reproduce and test results.

Step to reproduce (according to @rpodolyaka's message):
1. Install mos-tempest-runner following the instructions from https://github.com/Mirantis/mos-tempest-runner
2. take a patch from https://review.openstack.org/#/c/175949/ and patch /home/developer/mos-tempest-runner/tempest/tempest/api/compute/servers/test_server_actions.py file
3. run tests: "for i in {1..10}; do run_tests tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON 2>&1 | grep test_rebuild_server_with_volume_attached; done"

Result:
for i in {1..10}; do run_tests tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON 2>&1 | grep test_rebuild_server_with_volume_attached; done
    test_rebuild_server_with_volume_attached[id-b68bd8d6-855d-4212-b59b-2e704044dace,volume]OK 94.29
    test_rebuild_server_with_volume_attached[id-b68bd8d6-855d-4212-b59b-2e704044dace,volume]OK 85.59
    test_rebuild_server_with_volume_attached[id-b68bd8d6-855d-4212-b59b-2e704044dace,volume]OK 84.08
    test_rebuild_server_with_volume_attached[id-b68bd8d6-855d-4212-b59b-2e704044dace,volume]OK 83.34
    test_rebuild_server_with_volume_attached[id-b68bd8d6-855d-4212-b59b-2e704044dace,volume]OK 84.06
    test_rebuild_server_with_volume_attached[id-b68bd8d6-855d-4212-b59b-2e704044dace,volume]OK 90.95
    test_rebuild_server_with_volume_attached[id-b68bd8d6-855d-4212-b59b-2e704044dace,volume]OK 87.03
    test_rebuild_server_with_volume_attached[id-b68bd8d6-855d-4212-b59b-2e704044dace,volume]OK 129.37
    test_rebuild_server_with_volume_attached[id-b68bd8d6-855d-4212-b59b-2e704044dace,volume]OK 95.44
    test_rebuild_server_with_volume_attached[id-b68bd8d6-855d-4212-b59b-2e704044dace,volume]OK 115.47

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Set bug's status to Incomplete based on the arguments above ^^^

Folks, we have to have detailed and complete steps to reproduce for this issue.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Alexey, I'm afraid you won't get any, as this is a race condition.

The test mentioned above, when run together with other tests from tempest.api.compute.servers, allowed to reproduce this (or a related) issue. If it does not now, I suggest you merge the fix as is.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Closing this bug as Won't Fix for MOS 6 and MOS 6.1 as we have finished active support for those releases and don't merge non-critical and non-security fixes there anymore.

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 7.0-updates as there are no confirmed steps to reproduce so unable to fix and validate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.