Advanced include/exclude support

Bug #374274 reported by Nick Welch on 2009-05-09
482
This bug affects 104 people
Affects Status Importance Assigned to Milestone
Déjà Dup
Wishlist
Unassigned

Bug Description

There doesn't seem to be any way to exclude, for example, all files ending with ".iso". I entered "*.iso", and it accepted it, but actually it just interpreted that as a directory literally named "*.iso", instead of interpreting it as a glob pattern. And on top of that, it actually created the directory in my home dir. So then I had to go and delete this silly "*.iso" directory.

~ % dpkg-query -W deja-dup duplicity
deja-dup 9.1-0jaunty1
duplicity 0.5.09-0ubuntu2

~ % lsb_release -d
Description: Ubuntu 9.04

Nick Welch (mackstann) on 2009-05-09
description: updated
description: updated
Michael Terry (mterry) on 2009-05-09
Changed in deja-dup:
importance: Undecided → Wishlist
status: New → Confirmed
Nick Welch (mackstann) wrote :

Do you have any guidance as to how this feature might be implemented in the UI? Looking at the code, I definitely think I could implement it, but I'm not very great at making user-friendly UIs, and I would like to increase the possibility of my patch being accepted.

Using a file chooser dialog box to enter a glob pattern seems wrong. So I think maybe just a textbox is appropriate.

Maybe the exclude list could have a new button called "Add Pattern," which would pop up a text entry box? The existing "Add" button could possibly be renamed to "Add folder."

Sound reasonable?

Michael Terry (mterry) wrote :

I think your instinct to use two buttons is good. Some thoughts:

We'd probably need to store the values in a separate gconf list (because folders can technically have characters like a glob pattern, I believe). So we'd need to synthesize the two gconf lists into one user-visible list. Not hard, but just a detail. We could have a seperate icon for the globs in the list (like we have a folder icon for folders) or maybe no icon -- something to distinguish visibly.

The "Add Pattern" button should probably pop up a dialog that explains what the user can do here (i.e. "This lets you exclude files in any directory that match the glob" but nicer. :)). Just a bit of user-guidance.

The primary use case here is almost certainly to exclude files based on file suffix. It would rock if the dialog had a drop down list of mime types in user-friendly wording (this data is either available through GIO or freedesktop.org xml data files, which can probably be accessed via some library -- I've never messed with this particular area). So the use would select "Text Files" and this would behind the scenes select some simple glob like '*.txt' or '*.txt|*.text' or whatever. The dialog should also let the user specify a manual glob instead if desired, but I suspect 90% of users just want a file type. The button might be renamed to "Add File Type" then... But that doesn't get across that you can also manually do a glob...

Moreover, we may order the list to offer commonly-large things at the top (like 'iso files' or 'avi files' or something). We may even want to offer a meta option like 'video files' that would select avi, ogv, etc.

I know I'm making it into a bigger job that you probably wanted. :) But I'd be glad to help! If you'd like, contact the ~deja-dup-hackers list, grab me on Freenode IRC (username mterry), or email me.

Michael Terry (mterry) wrote :

Hmm, thinking about this some more... I don't think we'd want to display all mime types, that'd be silly. And asking the user to select each of the various 'video' mime types wouldn't be useful either. It's not likely that they'll want to backup .avi files but not .ogv.

I think if we were to offer a mime option, it would just be some hardcoded groups, like "Video Files" or "CD Image Files" (which would hit iso and the other goofier formats). We'd probably maintain those lists ourselves, not use a library (since I don't know of any library that could help us there, unless we could query the associated extensions for all audio/* mimes. But even that isn't 100% sure. I think the mime namespace is a bit more polluted than that. Or maybe most of the goofy application/ mimes that were really audio or video have migrated by now).

So bottom line is a much more sane/scaled-back version of the mime idea. I still like it though, and then offer a glob field for anything more special.

Wil Clouser (clouserw) wrote :

I'm attaching a mockup of a potential UI. It may have changes that are out of scope of this bug. Summary:

- Split preferences panel into tabs. This is what might be out of scope, but otherwise the pane is getting pretty large.
- Adds an optional "advanced matching" section. Patterns can be enabled/disabled via checkboxes. This allows you to add/maintain defaults like your list of video file types.
- I used regex in the mockup but it looks like people in this bug are talking about using bash globbing. I'm guessing with the way duplicity works globbing is easier (total guess) but I think regex is more powerful and well known. Consider this a vote for regex.
- There is no way to remove a pattern. I'm assuming right-clicking->remove is good enough since we're in an "advanced" section.

Michael Terry (mterry) wrote :

That's a pretty mockup, Wil. :)

I'm wary of complicating the preferences more than they already are. But you're right that it currently is pretty large. If any more controls are added, we're probably going to have to add a separate pane.

Wil, you didn't like the idea of hiding this control behind an 'add pattern' button in the folder list?

Wil Clouser (clouserw) wrote :

I'm concerned that adding patterns to the folder exception box will be confusing. I guess you could change the icon for patterns or something but I think it would be hard to be clear about what they are and, more importantly, I can't think of any other applications that do it that way. At that point you're digressing from standard UI practices like consistency and following expectations.

One thing to note when looking at my mockup is that it isn't clear that the patterns apply to file exceptions and not inclusions. We'd have to tweak the mockup to fix that - perhaps retitle it more appropriately or surround the entire "exceptions" section with a line. Minor stuff if it's the direction you want to go.

Michael Terry (mterry) wrote :

Just a quick note about a workaround. jpg from question 107518 says:
"if you gconf-edit the exclude-list and add patterns like "**/parts", the pattern is passed to duplicity and everything works as expected..."

Michael Terry (mterry) on 2010-05-06
description: updated
Yan Arnd (yan.arnd) wrote :

Filename patterns and user-defined include-/exclude-priorities are IMHO indispensable in a well-done backup scheme. As already written, the problem is to keep the GUI clear and understandable for the "normal user".

I order to have this feature quickly available without the need to plan a big GUI redesign, I suggest to simply provide a gconf key where a path to a globbing filelist – which can contain includes and excludes as well – can be lodged. This could be done via gconf-edit or a Deja-Dup command line option, e.g. --use-globbing-filelist <filename> – and maybe by a file selector for this in Deja-Dup's GUI, too. Then Duplicity just could be served that globbing filelist.
This way the "experts" among us can use Deja-Dup with include/exclude rules perfectly tailored to their needs, while the "normal user" has no problems with GUI options he does not understand.

Kip Warner (kip) wrote :

Hey Wil. Totally irrelevant to the discussion at hand, but what program did you use to come up with the mockup diagram? Love it. I could use something like that.

Dag Odenhall (dag.odenhall) wrote :

+1 on categorical {in,ex}clusion rather than pattern matching filenames. I think the most relevant reason for excluding files from (encrypted) backups though is to reduce the overall space requirements for the backups, so maybe size restrictions are more useful than type restrictions? I've seen that idea proposed for deja dup somewhere, is there a relevant bug for it?

As for mimetype groups, you *could* base it on associated applications. For example, "exclude all files that open in Totem or Rhythmbox". For better or worse, that would also exclude e.g. playlists, though.

Shahar Or (mightyiam) wrote :

Would the feature request of being able to use duplicity options like '--exclude-other-filesystems' be a different bug?

Michael Ekstrand (elehack) wrote :

Sharhar: --exclude-other-filesystems was included in my bug (#786741), which has been marked as a duplicate of this one, so hopefully.

All: I would like to be able to specify a file (or enter in a multiline text field) excludes in the format used by --exclude-globbing-filelist, as this allows for exclude/include interleaving. Being able to exclude globs is helpful, and goes a long ways, but I think that opening up the entire exclude/include syntax to advanced users would be useful.

Shahar Or (mightyiam) on 2011-06-19
summary: - Can't exclude filename patterns
+ Include/exclude support

I added a different bug for the feature request of --exclude-other-filesystems:
https://bugs.launchpad.net/deja-dup/+bug/799376

Michael Terry (mterry) on 2011-06-19
summary: - Include/exclude support
+ Advanced include/exclude support
Michael Terry (mterry) on 2011-06-22
tags: added: needs-design
On-The-Fly (onthefly) wrote :

I really like and support the idea. Areca Backup (http://www.areca-backup.org/) has some similar options.

For now I'd already be happy if the [+] and [-] buttons would simply allow selecting specific files as for now there is no real possibility to only backup /etc/fstab for example, which seems to be a pretty common scenario though in my eyes.

Paul Dugas (paul-dugas-cc) wrote :

As I watched my laptop crawl along last night, I was reminded of this request. Any news? Personally, I would like to see something along the lines of rsync's --filter='dir-merge /.rsync-filter'. It would allow more technically savvy types to place these files in various places for the backup system to use for filtering while keeping the UI clean and simple for novice types. $0.02...

Michael Ekstrand (elehack) wrote :

@Paul: I think that would be an excellent way to do it. My current backup system is exactly that with an rsync script I've cooked up. RSync's filter syntax and logic is quite flexible for this kind of thing, and directly exporting it seems to me to be ideal.

Brendan_P (brendan-p) wrote :

+1 on the rysnc method.

Just switched to this for it's encrypted backups and really missing this feature already :)

Luke Taylor (r7g-luke-h6i) wrote :

+1 on rsync method and on the importance of this feature request.

Fine-grained control over what files should or shouldn't be backed up is a crucial feature if you want advanced users to take this software seriously. I love Deja Dup and the simplicity of it's UI, reliability etc. but I live in fear of an ISO (or other large/unnecessary file) getting sucked into my backup-set - which would promptly clog my intertubes and online strorage space and frankly be a PITA.

Budmaester (budmaester) on 2012-10-31
Changed in deja-dup:
status: Confirmed → Fix Released
Michael Terry (mterry) on 2012-10-31
Changed in deja-dup:
status: Fix Released → Confirmed
Dylan Justice (dsjstc) wrote :

+1 important; every backup fails with an error dialog because I cannot exclude files such as .bash_history which are always open at backup-time.

Rik Shaw (rik-shaw) wrote :

I agree this is really essential. I think it can be relatively "clean" as suggested by the mockup in comment #4.

I would think, however, it would be important to set the "file types" PER specified include folder, and have it allow "include or exclude" file patterns (bear with me I don't think this has to be too complicated for even basic users!).

For example, if a user includes "Home" (as is default), then could click on that folder for "details" and it would THEN give you a pop-up with the options to have "include" or "exclude" filetypes (I would guess only one of the 2 would be used: either "include" filetypes -- ignore rest, or "exclude" filetypes -- include the rest. I think the "include" filetypes would be good so that a user could just select "office documents" (all LO and MS Office extensions), for example and not have to exclude all Video, Music, etc. types (for example). For beginning users this may be valuable so that when they aren't organized (some on desktop, some in Documents, etc) they still can get all "office documents" from entire home directory w/o all the cruft of other home things.

This way a user could do something like this:

include folders:
Home
    -include filetypes pop-up set to only include "office documents": everything else ignored.
.thunderbird
    -no filetype pop-up detail specified: so ALL of .thunderbird folder backed up.

For the UI, maybe there is an "edit" type option next to each include folder, that when clicked brings up the pop-up to specify filetype settings (include or exclude). If this is modified, then back on the main "include folder" tab the folder name could be italicized to indicate it has sub settings customized (or hover popup or whatever is standard UI for showing this).

The scenario I specifically have in mind is for 3rd world context where I am helping people backup to a USB thumb drive. It can't hold all the videos, music, pictures, etc., so I need to trim down what is backed up easily. But they also can have their "real documents" almost anywhere.

Michael Ekstrand (elehack) wrote :

@Andy If Deja-Dup just grows full support for rsync filters, the problem will be solved with that level of granularity. Rsync filters are incredibly powerful, and you can even merge in additional filter specs on a directory-by-directory basis.

IMO, Deja-Dup should *not* develop a filter mechanism with rsync's power but its own oddball syntax.

Rik Shaw (rik-shaw) wrote :

@elehack I appreciate your perspective. In fact, I use rsync myself for my own backups.

But I am targeting developing-world users that simply need a GUI only to handle version backups. These users also don't have good understanding of "size", and have shared videos and music all over the place, as are their files. Organization is a challenge.

I am hopeful there is a SIMPLE Gui that they can use (for backup AND restore such as the integrated Nautilus functionality to restore missing files).

Duplicity already contains these features, so I am not asking for an oddball syntax. I am just asking that this standard functionality of duplicity be considered for the deja-dup GUI.

For rsync backups, the Grsync gui is quite good. But even this is way too complicated for my targeted users (and in this way won't provide VERSIONED backups).

Rasmus Eneman (pie-or-paj) wrote :

I'm adding a vote for this issue. For me, regex matching on paths would do. A good interface would list files or directories (if a directory is excluded, you don't need to list every file inside it) that the filters hit.

My use case is that I wan't to backup my Dev-folder (sure, most stuff is on external git-server but sometimes I have big branches that are not yet ready to be published). However, npm and bower creates lots of files that are unnecessary to backup. Beeing able to exclude these files would decrease backup time (first priority) and backup size (not that important).

Brutus (brutus-dmc) wrote :

Since _duplicity_ already supports `--exclude` and [similar options](http://manpages.ubuntu.com/manpages/trusty/en/man1/duplicity.1.html), I think it might be sufficient to duplicate the folder ignore UI so that we have an "add pattern" button, which let's the user type in a pattern and adds that to another list — similar to the folder ignore list — store this list in a gconf setting and pass it along to _duplicity_.

BTW: @mterry mentioned:

> if you gconf-edit the exclude-list and add patterns like "**/parts", the pattern is passed to duplicity and everything works as expected..."

__Does this (still) work?__ And whats the current gconf storage place for that list (the location for the settings changed some times if I remember correctly)?

UBUCATZ (ubucatz) wrote :

what is the status of this? I would like to configure deja-dup to exclude all cache directories, also the python cache files.
@mterry: is the method of manually adding a regex to the dconf settings stable?
Please post updates about this important feature of a backup app.

Hotbelgo (hotbelgo) wrote :

I'd like to exclude all node_modules directories and descendents

nerv (nervgh) wrote :

> I'd like to exclude all node_modules directories and descendents
same here :+1:

Josias (iquabius) wrote :

> I'd like to exclude all node_modules directories and descendents

+1

Naël (nathanael-naeri) wrote :

1) Juan mentioned in linked question 107518 that extended shell globbing patterns (? [ * and **), which are supported by duplicity (see FILE SELECTION in man page), could be passed to duplicity through Déjà Dup's include-list and exclude-list GConf keys, thereby unofficially allowing advanced include/exclude support. This is mentioned in this present bug report by Michael in comment 7 and inquired about by Brutus in comment 24, Ubucatz in comment 25.

This does not appear to work any more, as far as I can tell... :( See my comment on linked question 107518 and please try for yourself: use gsettings to add some globbing pattern to the exclude-list key in the schema org.gnome.DejaDup (man gsettings for howto), run a verbose manual backup with 'DEJA_DUP_DEBUG=1 deja-dup --backup', and grep what you're trying to exclude in the output, chances are it's backed up. Also grep /usr/bin/duplicity to confirm that your globbing pattern was not passed to duplicity.

It's probably best that this workaround doesn't work any more though. Since the feature is not officially supported, at least it's consistent.

2) People here mention (extended) shell globbing patterns vs. regular expressions vs. rsync filter rules in order to provide advanced includes and excludes, should this feature be supported by Déjà Dup one day.

Duplicity supports extended shell globbing patterns and Python regular expressions, so both should be feasible. I'm not familiar with rsync filter syntax but it looks like it's extended-globs-based.

3) For what it's worth, I agree with Frank in comment 8 and question 107518, and Michael Ekstrand in comment 12 and duplicate 786741, that this feature could very well be only accessible to advanced users and provided as a way to pass arbitrary --include* and --exclude* options to duplicity, in whatever order is desired, since order is important for file selection.

For instance, consider a 'duplicity-options' GConf key that would not be set from the GUI but only from the CLI, and would contain a number of user-specified --include* and/or --exclude* options for duplicity. If unset, the includes and excludes specified in the GUI could be used, as is the case today. If set, the content of that key could simply be passed to duplicity, ignoring the includes and excludes specified in the GUI.

That way the GUI stays simple and beginner-friendly, there is no need to design/code/maintain a graphical way of supporting advanced file selection, and advanced users can still perform advanced file selection (through the CLI, which is OK since they're advanced).

What could be graphically available to less advanced users, though, through a simple textbox for instance, is a small subset of this advanced feature, for instance extension-based exclusion of files, as that seems to be a much requested use case (.git, .o, .txt~ etc).

4) Michael Terry is not subscribed to this bug report anyway, so it's unlikely he's seen the activity since his last comment, and even if he has he probably has much to do besides implementing this feature.

So it's no advanced file selection for now.

Akronix (akronix) wrote :

¿I'd like to exclude ".git" directories, any intention to support this some day?

Paul (wegsehen) wrote :

Is there any news on this? By backups disc get pretty quickly full because of tons of temporary build files.

Basil Titovchenko (unibasil) wrote :

> I'd like to exclude all node_modules directories and descendents

+1

I like the idea of a *duplicity-options* hidden setting that could be set in gconf-editor

Axel (aaxee) wrote :

> I'd like to exclude all node_modules directories and descendents

This iy my problem, too.

> I like the idea of a *duplicity-options* hidden setting that could be set in gconf-editor

+1

Luis Alberto Pabón (copong) wrote :

I'm having this problem as well. Not just "node_modules" on node projects but any other vendor package folders on other languages and frameworks.

Ideally, we'd be able to specify glob patterns for excludes, in addition to selecting folders. The UI could be simply one textbox per pattern, and automagically add an empty one below once we fill one in.

Seun LanLege (seunlanlege) wrote :

Any Update on this?

Michael Terry (mterry) wrote :

Nope, no update yet. I'm not actively working on this, and it would need design work.

J. Snow (jon.snow) wrote :

With the introduction of Snaps, many applications have a .cache folder within their ~/snap/ directory. Unfortunately without advanced exclude support it is impossible to ignore these folders. I understand that developing an advanced exclude support on the UI is not straightforward but how about adding a simple configuration to exclude .cache folders from snap apps?

Tom (tom-lorinthe) wrote :

It's sad we have to daily backup .cache, node_modules and so many other useless files. The best workaround I have is to copy and past a prepared config line to the ignore pattern, with dconf-editor >> org.gnome.DejaDup :

 ```
 [
    '$TRASH',
    'Websites/proj-1/node_modules',
    'Websites/proj-2/node_modules',
    'Websites/proj-3/node_modules'
 ]
```
Luckily, dconf-editor is forgiving enough to remove the empty spaces and lines.. ))

trendzetter (trendzetter) wrote :

+ 1 for exclude patterns

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments