lvremove failure on volume delete when creating and deleting volumes with the same name

Bug #1041334 reported by Matthew Treinish
This bug report is a duplicate of:  Bug #1038062: TgtAdm is broken. Edit Remove
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
In Progress
Critical
John Griffith
OpenStack Compute (nova)
Confirmed
Critical
Unassigned

Bug Description

The issue appears when quickly sending api commands to nova-volume and cinder: create volume, get volume, and delete volume are all run in quick succession at least twice with the same volume displayName for the volumes.

In other words creating then deleting a volume and immediatly after the delete is confirmed creating a volume with the same name and then trying to delete the second volume results in lvremove failing to remove the logical volume.

lvremove fails because the lv open count is 1. The open count in this case is caused by the iscsi export still running on the volume. The tgtadm check return no exports during the check because the wrong tid was used to check if there was an iscsi export. In my testing with tempest --tid=2 was used by nova-volume and cinder when the check erroneously returned 22, but manually running the tgtadm show operation with --tid=1 showed the iscsi export.

I believe this may be caused by a race condition with the previous volume's delete when creating the second volume that had the same displayName. When I changed the displayName of the second volume the failure no longer occurs.

Tags: volume
Changed in nova:
assignee: nobody → Matthew Treinish (treinish)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.openstack.org/11947

Changed in tempest:
assignee: nobody → Matthew Treinish (treinish)
status: New → In Progress
Changed in cinder:
status: New → Confirmed
milestone: none → folsom-rc1
Revision history for this message
John Griffith (john-griffith) wrote :

Turns out this can be reproduced fairly easily even using unique or no display_names.

The issue here is that in certain cases the tid selected/returned by tgt-admin is NOT the next sequential target id. As a result the tgt id info stored in the db's iscsi_targets is incorrect and when we attempt to delete it will fail.

I can hack around this and check when the delete fails for this particular situation and FIND the correct id, BUT... the bigger problem here is that the iscsi_targets table data is wrong. Trying to spend a little more time on figuring out a fix for the root cause here, but may have to revert the persistent targets change.

Also, I believe this bug has the same root cause as: https://bugs.launchpad.net/cinder/+bug/1038062

Mark McLoughlin (markmc)
Changed in nova:
milestone: none → folsom-rc1
Changed in cinder:
importance: Undecided → Critical
Changed in nova:
importance: Undecided → Critical
status: New → Confirmed
assignee: Matthew Treinish (treinish) → nobody
Mark McLoughlin (markmc)
tags: added: volume
no longer affects: tempest
Changed in cinder:
status: Confirmed → In Progress
assignee: nobody → John Griffith (john-griffith)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.