use copy-on-write for all rbd volume cloning

Bug #1209199 reported by Edward Hope-Morley on 2013-08-07
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Edward Hope-Morley

Bug Description

RBD volume cloning (from vol not snap) currently does a full copy. We could speed this up by creating a copy-on-write clone instead. This would require taking a snapshot of the volume first and then creating a copy-on-write clone. There are different ways we could implement this, especially in view of upcoming auto-flattening support, but for now I suggest we do it as follows:

* create discrete snapshot of volume if one does not already exist
* if snapshot has > some-configurable-number snapshots (detect with list_children()) we disallow the clone for performance degradation reasons and enforce a full copy as before - this can be disabled when auto-flattening comes in.
* create copy-on-write clone of snapshot
* et voila!

description: updated
Changed in cinder:
assignee: nobody → Edward Hope-Morley (hopem)
summary: - use copy-on-write for rbd volume cloning
+ use copy-on-write for all rbd volume cloning
description: updated
Edward Hope-Morley (hopem) wrote :

Copying in message from Chen, Xiaoxi:

"I have discussed with josh about the auto-flatten feature, almost share the same idea with you from code basis but go to different direction.
My proposal for auto-flatten is when there are too many children, then when user want to delete the parent volume, we don’t actuall flatten all children ,instead, we just lazy-delete the parent volume instead --- to prevent a flatten storm.
Your proposal is if a snapshot has too many children ,then disabled the CoW mechanism,this is another approach for prevent a flatten storm, but it’s doubtful for performance reason, for example ,if you have a snapshot of OS-DISK, and a lot of volumes cloned from that snapshot, it’s likely the base-image(to be exactly, the snapshot) will be very hot and thus be cached in DRAM. So it may be even faster than you do a full copy."

Edward Hope-Morley (hopem) wrote :

Hi Xiaoxi,

So I was not aiming to solve the auto_flatten issue here, but rather to support it as a future. As you say, many copies of a snapshot will likely cause a performance degradation and this was my understanding for the need to auto-flatten under certain circumstances. We do, however, need to consider the tradeoff between volume creation time/impact and performance. This modification will allow the user to optionally have copies performed using copy-on-write up to a certain number of copies. Therefore, the user will be able to

either (a) disable copy-on-write and have full copy for each volume
or (b) allow a fixed number of copy-on-write clones before switching to full copy (not flatten)
or (c) allow infinite copy-on-write

Then, if/when we decide best approach for auto-flatten we can add it in.

Xiaoxi Chen (xiaoxi-chen) wrote :

Hi Edward,
       At first, I really think this shoud goes to BP rather than a bug report.
       For the three user scenarios you listed , (a) and (c) now already supported , there is a bool flag in configuration(rbd_flatten_volume_from_snapshot) available there,if you want a full copy( scenario a) ,you could set it to True, otherwise set it to false(scenario c). So Basically what your add-on is to support scenarios b.
      Scenario b, as you said, is a tradeoff between creation time and performance, but ,althought it's very clear how this value impact the creation time, we still have no insight about how it affect performance. User will also confused about how to set this value, for example ,if I want best read performance and I don't care about the write performance, shoud I set it to 0 or infinity? increasing the value should slow down the creation time, but can I get better performance ? is there a reasonable /meaningful default value? The answers are all Unknown to me, and also to most of the users.
      So my opinions are:
1. let's move this to BP
2. I am -1 for this unless we(Ceph users) have very clear mind about what the trade off is.It's not a good idea to leave a magic tuneable there for user.

Changed in cinder:
status: New → Opinion
Edward Hope-Morley (hopem) wrote :


Option (c) is not currently available if you are cloning a volume from another volume. This modification is only for the case of volume clone from volume NOT volume clone from snapshot i.e.

I want to allow copy-on-write if the user does the following:

cinder --source-volid <id> ...

Currently, only full copy is supported for this command.

Your concern about performance is valid and this would obviously need some testing. I'm also hoping that jdurgin might give his thoughts on this.

Changed in cinder:
status: Opinion → In Progress
Josh Durgin (jdurgin) wrote :

I'm not so worried about performance degrading from many clones based on the same snapshot - it's unlikely that all the clones will be accessing the same objects all at once, and if they do, it'll be in the osd's page cache and thus take longer to become a bottleneck.

I'm fine with (c) - the issue with using copy-on-write for cinder's clone_volume() is handling the snapshot the rbd driver needs to create (which the user shouldn't see). Since snapshots in rbd prevent the volume from being deleted, this hidden snapshot used for cloning would require the original volume to stick around if there are still clones of it, even after the user requested that the user volume is deleted. If we don't have auto-flattening as well, this would enable a user to use extra un-accounted for space when they create a clone, then delete the original volume. With one clone per volume, they could use up to twice their quota.

If we automatically flatten all children of a volume when the parent snapshot (or volume with a hidden snapshot from cloning) is deleted, we could overload the backend with too many flattens at once. We could introduce a complex system of long-running operations and scheduling for flattens, or we could use a simple heuristic like 'only flatten all children when the number of clones is fewer than N'. This kind of rule would bound the extra space usage to 1/N, and help avoid too many flattens at once.

Edward Hope-Morley (hopem) wrote :

I am moving this discussion to so please add any further comments to the whiteboard.

Submitter: Jenkins
Branch: master

commit 52291d6554f2275b228d7039d222bccfab164106
Author: Edward Hope-Morley <email address hidden>
Date: Mon Aug 12 17:46:38 2013 +0100

    Added copy-on-write support for all RBD cloning

    Up till now we only had copy-on-write for cloning from snapshot. This
    change optionally allows clone from volume to use copy-on-write
    instead of a doing a full copy each time. This should increase speed
    and reduce nearterm storage consumtion but could introduce some new
    risks e.g. excessively long clone chains and flatten storms. To avoid
    this, a new config option has been providedons are provided -
    rbd_max_clone_depth - which allows the user to limit the depth of a
    chain of clones i.e.

        a->b->c->d as opposed to a->b

    This will avoid flatten storms by breaking chains as they are formed
    and at an early, predefined stage.

    A second option - rbd_clone_from_volume_force_copy - allows the user
    to use a full copy as before i.e. disable COW for volume clones.

    Implements: blueprint use-copy-on-write-for-all-volume-cloning
    Fixes: bug #1209199

    Change-Id: Ia4a8a10c797cda2cf1ef3a2e9bd49f8c084ec977

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2013-10-04
Changed in cinder:
milestone: none → havana-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2013-10-17
Changed in cinder:
milestone: havana-rc1 → 2013.2 (jingsong-ge) wrote :

一、I create a volume-01,and then create a snapshotfor it.
二、use volume-01-snapshot create a volume,named volume-02,and then create a snapshotfor it:volume-02-snapshot
三、use volume-02-snapshot create a volume,named volume-03,and then create a snapshotfor it:volume-03-snapshot

i find that ,there hava chains between volume-01~volume-08,and the chains don't flatten ,is there hava special reason?

Edward Hope-Morley (hopem) wrote :


What has happened here is that you are manually creating a volume then snapshotting it then creating a volume from snapshot. This sequence is automatically performed when doing a 'cinder create --source-volid <id> <size>' but with different logic. So what has happened here is that the snapshots you created are not considered 'clone' snapshots and, therefore, are not tracked under max_clone_depth. I suggest that if you want to clone volumes, do the create from the source vol direct as opposed to explicitly creating a snapshot and creating a volume from snap. (jingsong-ge) wrote :

Edward Hope-Morley (hopem) :

I am very grateful for your reply , i hava got it.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers