Comment 10 for bug 1576389

Revision history for this message
Nate Eldredge (nate-thatsmathematics) wrote : Re: [Bug 1576389] Re: Slow exclude pattern matching

On Mon, 4 Nov 2019, Jan Kratochvil wrote:

> So wrote a reduction, it reduced -exclude-filelist entries on my system from 607481 to 42169.
> Still it takes 0.12s to evaluate each file which will take 11+ days to do the backup.
> Still much better than half a year (with no system restart possible) it would take before!
> https://git.jankratochvil.net/?p=nethome.git;a=blob_plain;f=bin/rpmsafe
> https://git.jankratochvil.net/?p=nethome.git;a=blob_plain;f=bin/rpmsafereduce

Your use case seems like something way beyond what the exclude feature was
designed to handle. In particular I think it's rather inherently O(n*m)
where n is the number of exclude entries and m is the number of files to
be checked.

You can probably achieve something pretty close by just excluding a few
key directories. For example, typically /usr contains only files from the
package manager (with the exception of /usr/local), so you could just have
an exclude filelist containing

     + /usr/local
     /usr

and that should cover 95% of what you want to exclude.

If you really must exclude every single file, then since you don't really
need glob matching, you could hack a separate exclude mechanism into
duplicity; for instance, slurp your whole list into a Python set at
startup, and check each file against the set using `in'. The performance
cost of that should be negligible, and more like O(n+m) asymptotically.

--
Nate Eldredge
<email address hidden>