Deleting large snapshot on almost full btrfs drive takes up all RAM, brings machine to near halt

Bug #1332969 reported by Catalin Hritcu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Back In Time
Expired
Undecided
Unassigned

Bug Description

Trying to delete a snapshot ate up all memory causing my machine to become super unresponsive without actually deleting anything.

I tried deleting a (failed but almost complete) snapshot backintime froze for a really long time eating up all of my RAM (16GB!) and bringing my whole machine to a near halt. I needed to start a terminal (where I was getting 1s delay typing characters) to manually kill the 4 backintime processes. Afterwards it seems that the snapshot wasn't really removed at all. Removing the snapshot manually worked just fine:
[root@detained 1]# time rm -dfr 20140614-151820-891
rm -dfr 20140614-151820-891 0.36s user 12.76s system 2% cpu 7:32.88 total
[root@detained 1]#

I'm using backintime 1.0.34 on fully updated Arch Linux. My laptop has an SSD and I do backups on an external HDD.

Revision history for this message
Catalin Hritcu (catalin-hritcu) wrote :

Germar (germar) wrote (as comment to #992026):
If BIT really consumes all your RAM this is a serious bug that should have it's own report.

BIT uses this commands to remove snapshots:
find "/path/to/backintime/host/user/profile_id/snapshot_id" -type d -exec chmod u+wx "{}" +
rm -rf "/path/to/backintime/host/user/profile_id/snapshot_id"

Can you please test again (especially the find command) if that eat up your RAM? If it does please open a new bug for this.

I've tried running the find command on my most recent snapshot (the old one is gone now), and it worked well, without raising the total RAM usage beyond 12%. Here is the output of the time command:
[hritcu@detained 1]$ time find "20140622-105950-290" -type d -exec chmod u+wx "{}" +
find "20140622-105950-290" -type d -exec chmod u+wx "{}" + 0.48s user 6.24s system 3% cpu 3:35.11 total
[hritcu@detained 1]$

Revision history for this message
Germar (germar) wrote :

May be the os.system() call does this?!?
Please try again with:

cd /usr/share/backintime/common/
time python -c "import snapshots; snapshots.Snapshots().remove_snapshot('<SNAPSHOT_ID>')"

Where <SNAPSHOT_ID> is one of 'backintime --snapshots-list'

Revision history for this message
Germar (germar) wrote :

Oh, you're using Arch. So you must type 'time python2 -c ....'

Revision history for this message
Catalin Hritcu (catalin-hritcu) wrote :

I've tried this with a newly created snapshot that was very small, and it worked almost instantaneously:
[hritcu@detained common]$ time python2 -c "import snapshots; snapshots.Snapshots().remove_snapshot('20140622-181100-130')"
INFO: Command "find "/run/media/hritcu/MyBook/BackInTime/backintime/detained/hritcu/1/20140622-181100-130" -type d -exec chmod u+wx "{}" +" returns 0
INFO: Command "rm -rf "/run/media/hritcu/MyBook/BackInTime/backintime/detained/hritcu/1/20140622-181100-130"" returns 0
python2 -c 0.46s user 5.32s system 75% cpu 7.707 total
[hritcu@detained common]$

I'll try to reproduce the original circumstances when the problem was occurring, in which I forgot to exclude 100GB of new data, which generated a very large snapshot.

summary: - Deleting snapshot takes up all RAM, brings machine to near halt
+ Deleting large snapshot on almost full btrfs drive takes up all RAM,
+ brings machine to near halt
Revision history for this message
Catalin Hritcu (catalin-hritcu) wrote :

One detail I only realized now is that the btrfs-formatted external HDD to which I'm doing the backup is almost full. It was 95% free according to df, but that's not a very reliably model so I didn't pay much attention to it. Trying to recreate the large snapshot completely filled the disk now, and caused backintime to start erasing old backups, so now I'm at 86% according to free. Attaching a (slightly redacted) log. During the backup process and the consequent automatic backup delete memory usage stayed around 20%. Will try to see what happens when I try to delete the latest (partial/failed) snapshot and report back to you.

Revision history for this message
Catalin Hritcu (catalin-hritcu) wrote :

Deleting the latest partial snapshot was instantaneous:
[hritcu@detained common]$ time python2 -c "import snapshots; snapshots.Snapshots().remove_snapshot('20140622-205341-777')"
INFO: Command "find "/run/media/hritcu/MyBook/BackInTime/backintime/detained/hritcu/1/20140622-205341-777" -type d -exec chmod u+wx "{}" +" returns 0
INFO: Command "rm -rf "/run/media/hritcu/MyBook/BackInTime/backintime/detained/hritcu/1/20140622-205341-777"" returns 0
python2 -c 0.45s user 5.10s system 83% cpu 6.613 total

... and freed up around 6% of the 1TB drive ... so about 60GB. I'm starting to believe that this had nothing to do with backintime, but with btrfs performing poorly when almost full. WDYT?

Revision history for this message
Germar (germar) wrote :

Sounds reasonable. I've no experience with btrfs. But AFAIK is btrfs still in beta stadium and for backups I'd prefer something bullet proved like ext3. Sure, btrfs has some nice features, but do you really need them for your backup drive?

Revision history for this message
Germar (germar) wrote :

I'll set this to incomplete so it will expire in 60 days. It doesn't seem to be a bug of BIT. Please let me know if you think it is.

Changed in backintime:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Back In Time because there has been no activity for 60 days.]

Changed in backintime:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.