Globbing patterns with trailing slashes should only match directories

Bug #1479545 reported by Dmitry
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Duplicity
Fix Released
Medium
Unassigned

Bug Description

Ubuntu 14.04
Duplicity is 0.7.x, checked out 10 minutes ago from Bazaar via "bzr branch lp:duplicity".

Initially I thought that this "bug" is related to https://bugs.launchpad.net/duplicity/+bug/932482?comments=all , but developers said they committed the fix, and it does not help. So I create a new bug report here if you allow.

In some version of duplicity trailing slash processing has been broken, but is very important. Previously (many years ago probably) one could write in an excluding file something like:

+ /var/log/**/
- /var/log/**

That means "include the whole directory skeleton in the backup, but DO NOT include actual files with data". It was extremely handy: in many cases the directory structure is important (e.g. for /var/log: if you restore a machine from a full backup and do not have /var/log/apache2 directory, Apache will fail to load after the restoration).

So, trailing / means "hey, I mean directory, not a file!", and that was the only method to distinct directories and files in duplicity. So I would be very happy if plain old days return, and we can include directory skeletons to backups again.

Now this doesn't work, it's very easy to reproduce - just add the above lines to your exclude-globbing-filelist and see that all files in /var/log/ are backed up.

Related branches

Dmitry (dmitry-koterov)
description: updated
summary: - "Include the directory skeleton, but exclude all files with data" does
- not work anymore
+ Trailing slashes: "Include the directory skeleton, but exclude all files
+ with data" does not work anymore
Revision history for this message
Aaron Whitehouse (aaron-whitehouse) wrote : Re: Trailing slashes: "Include the directory skeleton, but exclude all files with data" does not work anymore

While the particular use case that you have proposed (including a directory skeleton without files) seems fairly niche to me, I note that the behaviour that you have proposed appears to be how glob.glob's expansion works:
https://docs.python.org/3.5/library/glob.html
https://hg.python.org/cpython/file/3.5/Lib/glob.py (see line 39)
and I would be quite keen to keep the behaviour of duplicity consistent with that.

It isn't directly on point, but it is worth reading this mailing list thread on the behaviour of single and double asterisks:
http://lists.nongnu.org/archive/html/duplicity-talk/2015-03/msg00017.html

Presumably fixing this would also allow for the use case of including the files in a directory, but not any subfolders:
- /some/path/*/
+ /some/path/*
which I'm not sure is currently possible.

Given that trailing slashes have been broken until my recent fix, now seems like the time to implement this, as there shouldn't be any risk of breaking existing behaviour.

Ken, are you happy for me to go ahead and try to implement this, or would you like a wider discussion on the mailing list about it first?

summary: - Trailing slashes: "Include the directory skeleton, but exclude all files
- with data" does not work anymore
+ Trailing slashes in globbing patterns should only match directories
summary: - Trailing slashes in globbing patterns should only match directories
+ Globbing patterns with trailing slashes should only match directories
Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote : Re: [Bug 1479545] Re: Trailing slashes: "Include the directory skeleton, but exclude all files with data" does not work anymore
Download full text (3.4 KiB)

I think we should go ahead and fix this to be consistent with previous
behaviour.

On Thu, Jul 30, 2015 at 5:57 AM, Aaron Whitehouse <
<email address hidden>> wrote:

> While the particular use case that you have proposed (including a
> directory skeleton without files) seems fairly niche to me, I note that the
> behaviour that you have proposed appears to be how glob.glob's expansion
> works:
> https://docs.python.org/3.5/library/glob.html
> https://hg.python.org/cpython/file/3.5/Lib/glob.py (see line 39)
> and I would be quite keen to keep the behaviour of duplicity consistent
> with that.
>
> It isn't directly on point, but it is worth reading this mailing list
> thread on the behaviour of single and double asterisks:
> http://lists.nongnu.org/archive/html/duplicity-talk/2015-03/msg00017.html
>
> Presumably fixing this would also allow for the use case of including the
> files in a directory, but not any subfolders:
> - /some/path/*/
> + /some/path/*
> which I'm not sure is currently possible.
>
> Given that trailing slashes have been broken until my recent fix, now
> seems like the time to implement this, as there shouldn't be any risk of
> breaking existing behaviour.
>
> Ken, are you happy for me to go ahead and try to implement this, or
> would you like a wider discussion on the mailing list about it first?
>
> ** Summary changed:
>
> - Trailing slashes: "Include the directory skeleton, but exclude all files
> with data" does not work anymore
> + Trailing slashes in globbing patterns should only match directories
>
> ** Summary changed:
>
> - Trailing slashes in globbing patterns should only match directories
> + Globbing patterns with trailing slashes should only match directories
>
> --
> You received this bug notification because you are subscribed to
> Duplicity.
> https://bugs.launchpad.net/bugs/1479545
>
> Title:
> Globbing patterns with trailing slashes should only match directories
>
> Status in Duplicity:
> New
>
> Bug description:
> Ubuntu 14.04
> Duplicity is 0.7.x, checked out 10 minutes ago from Bazaar via "bzr
> branch lp:duplicity".
>
> Initially I thought that this "bug" is related to
> https://bugs.launchpad.net/duplicity/+bug/932482?comments=all , but
> developers said they committed the fix, and it does not help. So I
> create a new bug report here if you allow.
>
> In some version of duplicity trailing slash processing has been
> broken, but is very important. Previously (many years ago probably)
> one could write in an excluding file something like:
>
> + /var/log/**/
> - /var/log/**
>
> That means "include the whole directory skeleton in the backup, but DO
> NOT include actual files with data". It was extremely handy: in many
> cases the directory structure is important (e.g. for /var/log: if you
> restore a machine from a full backup and do not have /var/log/apache2
> directory, Apache will fail to load after the restoration).
>
> So, trailing / means "hey, I mean directory, not a file!", and that
> was the only method to distinct directories and files in duplicity. So
> I would be very happy if plain old days return, and we can include
> directory skeletons to back...

Read more...

Changed in duplicity:
importance: Undecided → Medium
milestone: none → 0.7.04
status: New → Fix Committed
Changed in duplicity:
status: Fix Committed → Fix Released
Revision history for this message
Dmitry (dmitry-koterov) wrote :

Wow, that was fast.
Looks like it works now, and the issue is closed.
Thank you!

Revision history for this message
Aaron Whitehouse (aaron-whitehouse) wrote :

No problem, thank you for such a clear bug report and for confirming that it is now fixed.

Revision history for this message
claus (claus2) wrote :

Oops, that was actually not really a good idea! It's actually great, that people fix bugs quickly. But please also think about what you break with your fix first.

I had a command line similar to
  --include ~/very/important/stuff/here/needs/backup/ --exclude '**'
Now of course I checked the backups when I set this up and everything was fine.

Recently I noticed that backups are very fast (too fast for the amount of data) and found that all files are missing in the backup!

By changing behavior back and forth you probably messed up a lot of backups everywhere and many users are probably not aware that their files are currently not in the backup!

My suggestion on how to cleanup this mess:
a) Change back once again (It is better there are to many files in backup than to little).
b) Introduce an explicit switch e.g. "--include-dirs-only <your-dir>" for the old behavior

At least point out this old/new very unintuitive behavior in the man page please.

Sorry if this sounds a little harsh, I am not writing this to yell at people who are only trying to help others. But I am very concerned about users/admins having a data loss and then noticing their backups are useless.

Revision history for this message
Aaron Whitehouse (aaron-whitehouse) wrote :

Thanks Claus,

I have filed this separately as Bug #1624725 as this makes it easier to clarify the workload of this introduced bug. Thank you for the bug report.

We always think about what we break with changes to the code, but we do all sometimes make mistakes.

There are extensive unit/functional tests for the Select module in particular:
http://bazaar.launchpad.net/~duplicity-team/duplicity/0.8-series/view/head:/testing/functional/test_selection.py
http://bazaar.launchpad.net/~duplicity-team/duplicity/0.8-series/view/head:/testing/unit/test_selection.py
http://bazaar.launchpad.net/~duplicity-team/duplicity/0.8-series/view/head:/testing/unit/test_globmatch.py

Your use case was not documented in a test case. I have now done this for you and will add this into the code so that this will not be unknowingly broken in the future. If there are other duplicity features that you use that are not covered by existing test cases, please create these for us and ensure that we cannot unknowingly break anything else for you. Test cases are very easy to create and you can follow the existing ones as a guide.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.