3PAR: Multiple issues with online copy operations on HPE 3PAR Driver

Bug #1657227 reported by William Durairaj
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
In Progress
Undecided
Walt Boring

Bug Description

For creating a clone copy of volume (source) to another volume (target) of identical sizes, 3PAR HPE Driver makes use of online copy feature on the storage array.

However, this online copy feature (whereby a copy initiated from source to target volume is in progress, it allows the target volume to be exported to a host for attachment and subsequent I/O operations on that) has the following limitations

When the copy of data is in progress between source to the target volume

1) Metadata can't be set
2) Snapshot of the target is not allowed
3) 'userCPG','snapCPG' is not set by the 3PAR system until the copy operation is complete.

We have seen in numerous customer reported issues, that the above limitations apply and manifests itself as intermittent issues with volume operations.

Fix proposal:
============
To overcome these limitations, in the create_cloned_volume() call when the source , target sizes are same, a wait is applied until the task (which is asychronous activity) is changed to 'DONE' state.

Changed in cinder:
assignee: nobody → William Durairaj (william-dur-sandanaraj)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/499806

Changed in cinder:
assignee: William Durairaj (william-dur-sandanaraj) → Walt Boring (walter-boring)
status: New → In Progress
Revision history for this message
Walt Boring (walter-boring) wrote :

Using the online copy also leads to problems with deleting the hidden snapshots that are made from the temporary snapshots. You will eventually have tss-*** volumes that have hidden snapshots that can't ever be deleted.

Need to delete ldv Lds before deleting VV vvcp.30563.RW

Revision history for this message
William Durairaj (william-dur-sandanaraj) wrote :

Apologies for the confusion caused here.

After some analysis on the HPE Driver (3PAR) code, we found there are instances where we need the online copy feature in case of following scenarios.

1) image_volume_cache_enabled = True case (for creating cache volumes which contain glance images)
2) parallel volume copy (cinder create command in using unix parallel jobs ) operations
3) Some customer use-cases around cloning a volume of very large size.

We have later found , a patch in cinder task flow (in newton release) which retries the volume creation after a volume creation failure helps in removing couple of issues faced on the HPE 3PAR Driver.

Request Walter to abandon this change on this defect.

Revision history for this message
Walt Boring (walter-boring) wrote :

First off, the only time an online copy is used is when the original volume and the new volume are exactly the same size. This is not the case for nearly every volume cache enabled volume creates, because the cache volume size is the minimum size required for the image. The volumes will almost always be a larger size.

Parallel volume copy is why I created the patch to remove the online copy to begin with. The 3PAR can only actively be working on a certain number of online copies, which is a very small number, like 2 or 4 at a time. Any more online copies are queued up and will take forever to finish. Also, if you run parallel clones/creates from rally tests, like I did, you will run into lots of problems. I created some rally tests and exposed failures only when online copy was being used. When I removed the online copy, the parallel creates/clones worked. They take a while to complete, but they complete.

Third, When using online copy, there are many temporary hidden snapshots left in place will make the CPG unusable. Large volume clones are going to take time. They have to, since they do a byte for byte copy on the 3par. The online copy operation is not stable for a cloud based deployment. It just doesn't work.

Here are my rally tests that exposed the parallel creates when enabling the 3PAR driver in active/active HA mode. It just falls on it's face with online copy.
https://github.com/hemna/rally-plugins

Revision history for this message
William Durairaj (william-dur-sandanaraj) wrote :

In comment #3, I was referring this patch -- https://review.openstack.org/#/c/420733/ which was done on Newton timeframe.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/756709

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/756710

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.