os-brick

Bug #1578036
Comment #38

Comment 38 for bug 1578036

Revision history for this message

weiguo sun (wsun2) wrote on 2017-10-25:

#38

Hi Xiaojun,

Thanks for clarifying the design context of scaling the backup service. It seems that there are two potential fixes,

(1) Ceph backup driver skips incremental/differential backup once determining the volume is "in-use" status or the return volume is a snap-clone volume; however, the ceph backup driver would need to use the snap-clone volume as the source data, which is not the case with the driver right now (see the following debug output), and the driver is trying to transferring data from the original in-use volume (f8f6c0a3-b19c-43f3-965f-59945f4dc4b3), which won't be crash recovery consistent.

2017-10-12 20:31:24.207 104248 DEBUG cinder.backup.drivers.ceph [req-82609901-3596-4c28-905b-63cbc056c3a9 f4c2cd21cd1841f4bd87e4910291f930 729810f6d86f467082ec3fe9a70c84df - default default] Copying data from volume f8f6c0a3-b19c-43f3-965f-59945f4dc4b3. _full_backup /usr/lib/python2.7/site-packages/cinder/backup/drivers/ceph.py:712
2017-10-12 20:31:24.251 104248 DEBUG cinder.backup.drivers.ceph [req-82609901-3596-4c28-905b-63cbc056c3a9 f4c2cd21cd1841f4bd87e4910291f930 729810f6d86f467082ec3fe9a70c84df - default default] Transferring data between 'volume-f8f6c0a3-b19c-43f3-965f-59945f4dc4b3' and 'volume-f8f6c0a3-b19c-43f3-965f-59945f4dc4b3.backup.af9a04a4-7778-454c-9ca7-30c6f1dec4bb' _transfer_data /usr/lib/python2.7/site-packages/cinder/backup/drivers/ceph.py:304

(2) 2nd option is that ceph backup driver ignores the returned backup volume object (snap-clone) and takes a 'from-snap' against the original cinder/ceph volume. I would think this 'from-snap' is good enough for the incremental/differential backup. I don't think this is against the principle of "scaling the backup service" but I am willing to be convined otherwise. This option doesn't require the 'in-use' volume to be mounted on the backup service node but only the snapshot to be visible, which is how the "available" cinder/ceph volume is being backed up based on my debugging observation. To my understanding, ceph rbd volume snapshot is atomic and hence crash recovery consistent. So a regular snapshot for volume 'in-use' should be sufficient for crash recovery.

Hi Xiaojun,

Thanks for clarifying the design context of scaling the backup service. It seems that there are two potential fixes,