FeatReq: alternate filenames to support S3/Glacier

Bug #1170161 reported by Dave Cottingham on 2013-04-17
52
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Duplicity
Medium
Unassigned

Bug Description

Feature request: be nice when using S3/Glacier to only transition the difftars to Glacier, while leaving the manifests and sigtars alone. Given the current duplicity naming scheme, and the limitations of the Glacier transition rules, this is currently impossible. S3 transition rules can distinguish which files to transition to Glacier by matching against a file prefix -- a direct string match, not a regexp match. So basically you need to be able to tell if it's a manifest, sigtar, or difftar by looking at the part before the timestamp. At present sigtars can be identified this way, but manifests and difftars and not distinguishable by a prefix match.

This feature would also permit a simple workaround for bug 1170113.

Some discussion of this feature request is in this mailing list thread:
http://lists.nongnu.org/archive/html/duplicity-talk/2013-04/msg00052.html

I also note that recent discussion of bug 1039511 has touched on this same feature request.

I am attaching a patch to 0.6.21, which patch is completely untested, which implements a version of this scheme. This patch introduces a switch "--alternate-filenames" which directs duplicity to generate the alternate filenames, but with or without the switch it will read both old and new filenames. Thus I believe this can work with a mix of old and new filenames in a backup chain. Again, the patch is untested.

Dave Cottingham (dcottingham00) wrote :
Richard Freeman (r-launchpad-q) wrote :

One comment - the difftar files do not really have a prefix, and those are the ones most in need of a prefix. When setting up retention rules on s3 you set them on the files you want to move to glacier, not the ones you want to leave in s3.

I'd suggest adding a prefix of some kind to the difftar files.

Right now the tardiff files can be identified because they use a period instead of a hyphen after the common prefix (eg duplicity-full).

The current conventions are:
duplicity-inc-manifest.date
duplicity-inc-signatures.date
duplicity-inc.date

The resulting prefix would be "duplicity-inc." - only the period separates this from the manifests which might be kept online.

It would make more sense to name them:
duplicity-inc-manifest.date
duplicity-inc-signatures.date
duplicity-inc-difftar.date
(or some other prefix)

Otherwise your patch works fine for me.

jurov (rini17) wrote :

I have decided to do it differently, to allow complete user-customization of user's S3/Glacier policies. Attached patch adds new duplicity switch, --s3-optional-prefix=<string> . This option has no effect on newly created files when backing up. Only whenever duplicity requests files from S3, both files bucket/prefix/duplicity* and bucket/prefix/<string>duplicity* are read and <string> is then ignored. This applies to all kinds of duplicity files. The advantages:

* 100% compatible with older backups but not with unpatched duplicity (still, renaming the files back is easy).
* User has complete control, using any criteria, what stuff should/should not be moved to glacier. After every backup external rename tool should be run to add prefix to some S3 files as desired. I will provide my sample script for this renaming that uses file size (adds prefix to all manifests and small files). On restore, no further operations are needed.

This patch is preliminary but tested, I plan to make proper branch and maybe support multiple prefixes or kind of pattern (if supporting that won't be too invasive).

James Keagie (jameskeagie) wrote :

Jurov - could you possibly post your script that renames files in S3 so they don't get moved to glacier? I'm interested in using duplicity to push backups to S3, but would need to have Glacier for the cost savings.

jurov (rini17) wrote :

This python script to be ran after every backup is rather makeshift but hopefully easy to customize. It renames all manifest files and all files smaller than 5MB in an S3 folder to add a _small_ prefix. Requires patched duplicity invoked with --s3-optional-prefix=_small_ switch.

Note that since renaming S3 files is not implemented, instead copy+delete is done , it changes modify time (thus it should be done right after backup) and should be done on small files, not on large ones (otherwise it is slow).

jurov (rini17) wrote :

Update to move.py: You can then define glacier policy only for files named duplicity_* .

Dear all,
I really appreciate this discussion. So far I have been using a dirty workaround to use duplicity along with Amazon Glacier. (see also http://blog.epsilontik.de/?page_id=68).
Inspired by this thread and a comment in the blog I finally took the time to develop the attached patch to change the naming for glacier.

With this patch the naming conventions are changed as follows (if using the --alternate-filenames option)

duplicity-full-signatures.date (stays unchanged)
duplicity-new-signatures.date (stays unchanged)
duplicity-full.date.manifest => duplicity-full-manifest.date
duplicity-inc.date.to.date.manifest => duplicity-inc-manifest.date.to.date.manifest
duplicity-full.date.VOLX.difftar => duplicity-full-difftar.date.VOLX
duplicity-inc.date.to.date.VOLX.difftar => duplicity-inc-difftar.date.to.date.VOLX

This will allow to filter for all file types using the standard lifecycle rules for glacier.

I tested the patch with a standard and an incremental backup and it seems to work fine. I'd appreciate if you could also have a look.

Do you have any standard test cases that you are using in the development of duplicity?

Rgds
-E

derp herp (junkmail-trash) wrote :

Thank you for the patch, is this backwards-compatible with the prior naming scheme? It appears that new and old names can co-exist, and the lifecycle rules may be applied to just the files with the new naming scheme - is this correct?

Dave Cottingham (dcottingham00) wrote :

For the patch I posted, it is backward-compatible in the sense that a backup chain can contain files made with and without --alternate-filenames. I believe that is also true of the patch posted by epsilon, but perhaps he can comment on that.

It is not exactly true that lifecycle rules can be written that only apply to the new naming scheme -- because not all the file names are changed. The major purpose of the patch is to make it possible to write lifecycle rules that distinguish manifests from difftars.

The patch does not change the namin for all files (if using the --alternate-filenames).
duplicity-full-signatures.date and duplicity-new-signatures.date stay unchanged.

I'm not deep enough in the internals of duplicty to see whether this may be an issue if a backup with classical naming and the alternate naming

schemes end up in the same folder. I can't see any direct issue but out of my heart I would no recommend doing so.
The tricky thing is in any case that the regular expressions to identify files are now matching both naming schemes. If a folder contains files of both naming schemes, I'm not exactly sure, how the code will behave.

If you think that it is essential to be backward compatible in this way, I can update the patch and also the signature files should be renamed to be on the safe side.
The alternative would be to advise the user not to use both options for backups ending up in the same folder.

Please let me know your thoughts.
- E

It's me again. After doing some more testing with my patch I found a small bug in it that only appears if duplicity handles more than 10 difftar files. I fixed this in the patch attached and will do some more exhaustive testing with it over the next few days. I'll keep you posted.
-E

Changed in duplicity:
milestone: none → 0.6.24
importance: Undecided → Medium
status: New → Fix Committed
jurov (rini17) wrote :

Sorry, did anyone look to my patch? It is much smaller and has the advantage it allows you to decide yourself which files you want to have in Glacier and which not. With --alternate-filenames you must accept duplicity's naming.

I believe the applied patches are mine http://lists.nongnu.org/archive/html/duplicity-talk/2014-01/msg00031.html

and they add options to add a prefix to any of the 3 types of files.

... and I wish I had seen this bug before I started working on mine.

Good to see that this topic is still alive.
@jurov: I looked at your patch before starting with mine but I thought it would be a better idea to get this functionality integrated into the core functionality of duplicity. I'm currently having a workaround that works similar to your patch but in the end it's only a workaround.
@matthew: I belive that this discussion here is older than the patch you suggested. But it's always good to have more than one view to the topic :-) I can see that your patch allows for a higher degree of flexibility. On the other hand this also makes things a little bit more complex. E.g. the user will need to remember all their settings for a sucesfull restore. My solution has less flexibility but works for the purpose.

In the end I'd love to see either solution implemented in the next release. Both solutions are better than my current workaround.....

So long
E

That is true. User needs to remember the 3 settings. But if they forget, they can easily remind themselves by looking at the file names of backup files.

jurov (rini17) wrote :

Thanks for encouragement. I'll look into doing this as global option. As far as I remember, though, querying the storage for both prefixed/unprefixed file names and then getting the right information out of the names would need updates in all modules.

I believe everything can be done in file_naming.py. There would just need to be 2 regex for each of the half dozen file types.

Yes, I guess that this would be the most straightforward approach.

Changed in duplicity:
status: Fix Committed → Fix Released
Naftuli Kay (naftulikay) wrote :

The fix has been "released," but I don't see any 0.6.24 packages available for 14.04 or 12.04. When will this fix hit the LTS releases?

Naftuli Kay (naftulikay) wrote :

Edit: it seems to have been released as an RPM (as per here: https://launchpad.net/duplicity/+milestone/0.6.24), but not as a DEB, any reason why?

For LTS versions Ubuntu does not allow anything but fixes. For
enhancements you need to go the the duplicity PPA.

On Thu, Jan 1, 2015 at 6:34 PM, rfkrocktk <email address hidden>
wrote:

> Edit: it seems to have been released as an RPM (as per here:
> https://launchpad.net/duplicity/+milestone/0.6.24), but not as a DEB,
> any reason why?
>
> --
> You received this bug notification because you are subscribed to
> Duplicity.
> https://bugs.launchpad.net/bugs/1170161
>
> Title:
> FeatReq: alternate filenames to support S3/Glacier
>
> Status in Duplicity - Bandwidth Efficient Encrypted Backup:
> Fix Released
>
> Bug description:
> Feature request: be nice when using S3/Glacier to only transition the
> difftars to Glacier, while leaving the manifests and sigtars alone.
> Given the current duplicity naming scheme, and the limitations of the
> Glacier transition rules, this is currently impossible. S3 transition
> rules can distinguish which files to transition to Glacier by matching
> against a file prefix -- a direct string match, not a regexp match. So
> basically you need to be able to tell if it's a manifest, sigtar, or
> difftar by looking at the part before the timestamp. At present
> sigtars can be identified this way, but manifests and difftars and not
> distinguishable by a prefix match.
>
> This feature would also permit a simple workaround for bug 1170113.
>
> Some discussion of this feature request is in this mailing list thread:
>
> http://lists.nongnu.org/archive/html/duplicity-talk/2013-04/msg00052.html
>
> I also note that recent discussion of bug 1039511 has touched on this
> same feature request.
>
> I am attaching a patch to 0.6.21, which patch is completely untested,
> which implements a version of this scheme. This patch introduces a
> switch "--alternate-filenames" which directs duplicity to generate the
> alternate filenames, but with or without the switch it will read both
> old and new filenames. Thus I believe this can work with a mix of old
> and new filenames in a backup chain. Again, the patch is untested.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/duplicity/+bug/1170161/+subscriptions
>

Naftuli Kay (naftulikay) wrote :

Does the PPA contain these changes? Where is the PPA? Can you link?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments