Cinder

long flashcopy operations in the storwize_scv driver will block in _delete_vdisk()

Bug #1203152 reported by Jay Bryant on 2013-07-19

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Fix Released	High	Avishay Traeger	Cinder 2013.2 "havana"

Bug Description

There is a loop inside cinder/volume/drivers/storwize_svc.py _delete_vdisk() function that will
wait on flashcopy to finish before the vdisk can be deleted. If trying to delete a cinder volume
that is created from snapshot or another volume before the flashcopy finishes, the volume
service process will loop and wait for the flashcopy to be done. Since the code is blocked
in the _delete_vdisk code, volume service is blocked and won't respond to REST API
or update status. The service will be marked offline.

I am waiting for the person who found this bug to test a change that puts the while loop into an
inline function that I then run with FixedIntervalLoopingCall.

I hope to have a patch to post here later today once we have been able to test the code I wrote.

Tags:

Jay Bryant (jsbryant) on 2013-07-19

Changed in cinder:
assignee:	nobody → Jay Bryant (jsbryant)

Revision history for this message

Kun Huang (academicgareth) wrote on 2013-07-20:

# Timeout after 5 seconds
@timeout(5)
def long_running_function2():
...

could this be helpful on self._get_flashcopy_mapping_attributes()

Avishay Traeger (avishay-il) on 2013-07-21

tags:	added: drivers storwize-svc
Changed in cinder:
status:	New → Confirmed
importance:	Undecided → High

Revision history for this message

Jay Bryant (jsbryant) wrote on 2013-09-11:

Moving this to 'Invalid'. I am not able to recreate the problem reported by the user and they also are unable to recreate the problem.

I am able to get one thread of execution in the _ensure_vdisk_no_fc_mappings function. It will sit there waiting for the vdisk to be in a state where it can be deleted. I can start a second delete request and it will get to the _ensure_vdisk_no_fc_mappings function and also wait.

So, given that, if the problem does still exist I don't think that the problem could be at this point in the code. Can always reopen if the problem reappears.

Changed in cinder:
status:	Confirmed → Invalid

Revision history for this message

Alan Jiang (ajiang) wrote on 2013-10-03:

Jay

I think the user reported the problem has my internal fix already. The problem needs to be created when there
is a long running flashcopy clone especially when there are multiple flashcopy from the save source vdisk.

Alan

Revision history for this message

Jay Bryant (jsbryant) wrote on 2013-10-03:

I just had a chat with Alan. I was not aware that they were still able to recreate this problem fairly easily. The person I had been working with wasn't able to recreate. Perhaps, as he noted, they were already running with the fix.

Alan is going to push up a fix based on the debug code I asked him to try.

I am reopening this bug so that it can be used to check the code in against.

Changed in cinder:
status:	Invalid → Confirmed
tags:	added: grizzly-backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-10-03: Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/49647

Changed in cinder:
assignee:	Jay Bryant (jsbryant) → Alan Jiang (ajiang)
status:	Confirmed → In Progress

Avishay Traeger (avishay-il) on 2013-10-06

tags:

added: havana-rc-potential

OpenStack Infra (hudson-openstack) on 2013-10-09

Changed in cinder:
assignee:	Alan Jiang (ajiang) → Avishay Traeger (avishay-il)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-10-10: Fix merged to cinder (master)

Reviewed: https://review.openstack.org/49647
Committed: http://github.com/openstack/cinder/commit/7aa4f65a8c17aa037deff0f5b534ed694c17e62a
Submitter: Jenkins
Branch: master

commit 7aa4f65a8c17aa037deff0f5b534ed694c17e62a
Author: Alan Jiang <email address hidden>
Date: Thu Oct 3 17:03:09 2013 -0500

long flashcopy operation may block volume service

    Storwize family uses flashcopy for snapshot or volume clone. The
    volume delete has to wait until flashcopy finishes or errors out.
    The _delete_vdisk() will poll volume FlashCopy status in a loop.
    This may block volume serivce heartheat since it is in the same
    . The solution is to use openstack FixedIntervalLoopingCall
    to run the FlashCopy status poll in a timer thread.

    The cinder volume mananger will resume delete operation for those
    volumes that are in the deleting state during volume service startup.
    Since Storwize volume delete may wait for a long time, this can cause
    volume service to have long delay before it becomes available.
    A greenpool is used to offload those volume delete operations.

Change-Id: Ie01a441a327e1e318fa8da0040ae130731b7a686
Closes-Bug: #1203152

Changed in cinder:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-10-10: Fix proposed to cinder (milestone-proposed)

Fix proposed to branch: milestone-proposed
Review: https://review.openstack.org/50984

John Griffith (john-griffith) on 2013-10-10

Changed in cinder:
milestone:	none → havana-rc2

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-10-10: Fix merged to cinder (milestone-proposed)

Reviewed: https://review.openstack.org/50984
Committed: http://github.com/openstack/cinder/commit/8a2a3d691fa54c07d14b3e32558641f43b69c040
Submitter: Jenkins
Branch: milestone-proposed

commit 8a2a3d691fa54c07d14b3e32558641f43b69c040
Author: Alan Jiang <email address hidden>
Date: Thu Oct 3 17:03:09 2013 -0500

long flashcopy operation may block volume service

    Change-Id: Ie01a441a327e1e318fa8da0040ae130731b7a686
    Closes-Bug: #1203152
    (cherry picked from commit 7aa4f65a8c17aa037deff0f5b534ed694c17e62a)

Changed in cinder:
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2013-10-17

Changed in cinder:
milestone:	havana-rc2 → 2013.2

Alan Pevec (apevec) on 2014-03-31

tags:	removed: grizzly-backport-potential
tags:	removed: havana-rc-potential

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.