Merge of incremental snapshot metadata is much too slow (Python process runs 100% CPU)

Bug #585358 reported by be4truth on 2010-05-25
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
nssbackup
Won't Fix
High
Unassigned

Bug Description

Since 2 days backups fails and nssbackupd causes at startup a 100% CPU load with a process called "python" and the process doesn't lead anywhere. It needs to be killed to bring computer back to normal state.

Lucid 32 bit fully updated
Not So Simple Backup Suite 0.2-0
python 2.6.5-0ubuntu1
http://ppa.launchpad.net/nssbackup-team/ppa/ubuntu
log:
NSSBackup 'Default Profile' Logger
==============

Download full text (7.8 KiB)

NSSBackup 'Default Profile' Logger
==============
2010-05-25 17:48:04,773 - INFO in ConfigManager.__create_logger(713): Log output for [Default Profile] is directed to file '/var/log/nssbackup.log'.
2010-05-25 17:48:04,774 - DEBUG in ConfigManager.validateConfigFileOpts(760): Validating config file.
2010-05-25 17:48:04,774 - INFO in ConfigManager.__init__(324): Profile settings are being read from file '/etc/nssbackup.conf'.
2010-05-25 17:48:04,774 - DEBUG in nssbackupd.__init__(84): Not So Simple Backup Suite 0.2-0
2010-05-25 17:48:04,824 - DEBUG in BackupManager.__init__(249): Instance of BackupManager created.
2010-05-25 17:48:04,835 - INFO in BackupManager.makeBackup(268): Backup process is being started.
2010-05-25 17:48:04,836 - DEBUG in BackupManager.__setlockfile(441): Created lockfile at '/var/lock/nssbackup.lock' with info '2925'
2010-05-25 17:48:04,836 - INFO in FuseFAM.initialize(176): Initializing FUSE File Access Manager.
2010-05-25 17:48:04,837 - DEBUG in FuseFAM.initialize(242): [log]
file = '/var/log/nssbackup.log'
level = '10'
[places]
prefix = '/usr'
[schedule]
anacron = 'daily'
[dirconfig]
/media/ = '0'
/home/.....
/home/.....
/var/cache/ = '0'
/home/.......
/var/spool/ = '0'
/var/tmp/ = '0'
/home/.......
/etc/ = '1'
[general]
splitsize = '0'
target = '/media/backup/dailybackup'
format = 'bzip2'
mountdir = '/mnt/nssbackup'
maxincrement = '7'
purge = '10'
followlinks = '0'
lockfile = '/var/lock/nssbackup.lock'
[report]
from = 'NSsbackup Daemon <email address hidden>'
to = '..........'
smtppassword = '.......'
smtptls = '1'
smtpuser = '.............................'
smtpport = '587'
smtpserver = '.........................'
[exclude]
regex = '\\.mkv,/home/[^/]+?/\\.thumbnails/,/home/[^/]+?/\\.Trash,/home/[^/]+?/\\..+/[cC]ache,\\..vdi'
maxsize = '157286400'
2010-05-25 17:48:04,837 - INFO in UpgradeManager.need_upgrade(140): Checking need for upgrading in '/media/backup/dailybackup'
2010-05-25 17:48:04,840 - DEBUG in SnapshotManager.get_snapshots(204): [Snapshots Listing - current format]
2010-05-25 17:48:04,840 - DEBUG in SnapshotManager.get_snapshots(206): 2010-05-23_10.03.24.877220.lucid.inc
2010-05-25 17:48:04,840 - DEBUG in SnapshotManager.get_snapshots(206): 2010-05-21_06.28.06.547394.lucid.inc
2010-05-25 17:48:04,840 - DEBUG in SnapshotManager.get_snapshots(206): 2010-05-20_06.27.39.167673.lucid.ful
2010-05-25 17:48:04,840 - DEBUG in SnapshotManager.get_snapshots(206): 2010-05-19_20.54.48.455397.lucid.inc
2010-05-25 17:48:04,840 - DEBUG in SnapshotManager.get_snapshots(206): 2010-05-18_06.31.59.387443.lucid.inc
2010-05-25 17:48:04,840 - DEBUG in SnapshotManager.get_snapshots(206): 2010-05-17_06.18.56.506006.lucid.inc
2010-05-25 17:48:04,841 - DEBUG in SnapshotManager.get_snapshots(206): 2010-05-16_13.21.05.248497.lucid.inc
2010-05-25 17:48:04,841 - DEBUG in SnapshotManager.get_snapshots(206): 2010-05-15_07.54.49.296694.lucid.inc
2010-05-25 17:48:04,841 - DEBUG in SnapshotManager.get_snapshots(206): 2010-05-14_06.44.49.111514.lucid.inc
2010-05-25 17:48:04,841 - DEBUG in SnapshotManager.get_snapshots(206): 2010-05-13_18.31.50.714097.lucid.ful
2010-05-25 17:48:04,841 - INFO in SnapshotManager._do_cutoff_purge(993): Simple purge ...

Read more...

Jean-Peer Lorenz (peer.loz) wrote :

Thank you for taking the time reporting this bug.

Could you please try the following: disable the removal of snapshots by unchecking the according checkbox in the GUI.

This should solve the problem temporarily. However, backups are then no longer removed by nssbackup. If you are required to free disk space, you can manually remove backups older than a certain full backup. Incremental backups are always based on the latest full backup.

Some more questions that'd help fixing the issue:
1. can you post the lines following 'Maximum free size required is ...' from the last incremental and last full backup ran.
2. how long did the backup took regularly.
3. can you post the approx. filesize of the 'files.snar' located in the backup dirs.
4. what are your machine specicfications (cpu, ram).

Thank you so much for your help.

Jean-Peer Lorenz (peer.loz) wrote :

FYI:

the process does not hang in an infinite loop. It just takes *very* long :-( This bug is caused by bad implementation of the merging process of snar files (files that store the metadata for incremental backups). When a backup is about being removed other backups might get merged since they depend on another. Every time this happens (and actually it happens every time a backup is removed) the metadatas are merged. And now consider two files containing metadata of 100.000 files that are merged using a not so smart algorithm.

In fact, the problem of snapshot removal is much more complicated (in a performance way) than it appears at first glance.

Dear Jean-Peer,

the problem is solved. I noticed after reading the debig log (I am not a programmer ;-) ) that it was kind of connected with the removal of old backups. Then I remembered tha the old backups were done by sbackup and not nssbackup before I upgraded to Lucid. After removing those backups everything is fine and works smoothly.

Thanks your taking the time

:-)

How do I mark this as solved?

Graham Ranger (graham-ranger) wrote :

Hello,
I'm not sure it is entirely solved. I have the same problem, but am using backups made only with nssbackup. Disactivating backup removal might do the job, but does not seem entirely satisfactory, since one of the good things about nssbackup is that it makes backing-up automatic!
Thanks for any help in advance!

summary: - nssbackup causes python process to run on 100% CPU in a loop
+ Merge of incremental snapshot metadata is much too slow (Python process
+ runs 100% CPU)
Changed in nssbackup:
importance: Undecided → High
status: New → Confirmed
Jean-Peer Lorenz (peer.loz) wrote :

@be4truth:
No, the bug isn't solved and I've marked it as confirmed. Could you provide the requested information to help solving the issue, please?

@Graham:
You're right, the bug is still present. The cause of the problem is described in comment #2 and #3. Currently, I'm working on a more sophisticated work-around: snapshots are only removed if they are older than a specified number of days *and* are stand-alone (i.e. do not build the base for other snapshots). This will prevent daily/regularly snapshot merges and preserves the remove functionality. Snapshots are then removed in 'blocks' (e.g. if you make a full backup every 7th day, the backups of a whole week are removed when the latest incremental backup of that week gets older than your specified max. age) I'll commit these changes soon.
Could you also provide the information I've requested in comment #2, please?

I'll provide the infos asap. I am not in office right now.

Graham Ranger (graham-ranger) wrote :

Hi, thanks for your answer. With respect to your questions:
1. can you post the lines following 'Maximum free size required is ...' from the last incremental and last full backup ran.
I'm not sure where I should look for this... sorry. I've looked in /var/log and in the log in the backup folder, but haven't found the lines. Any help finding it would be appreciated.
2. how long did the backup took regularly.
Generally the full backup is done in about 20', I'd say, and the incremental in less that 5'.
3. can you post the approx. filesize of the 'files.snar' located in the backup dirs.
files.snar in a full backup weighs 1.9 Mo.
4. what are your machine specicfications (cpu, ram).
CPU details are posted here (from sudo lshw):
description: CPU
          product: AMD Athlon(tm) XP 2600+
          vendor: Advanced Micro Devices [AMD]
          physical id: 5
          bus info: cpu@0
          version: 6.10.0
          slot: Socket A
          size: 1916MHz
          capacity: 3GHz
          width: 32 bits
          clock: 166MHz
and RAM details here:
*-memory
          description: System Memory
          physical id: 1d
          slot: System board or motherboard
          size: 1280MiB
          capacity: 2GiB
        *-bank:0
             description: DIMM
             product: None
             vendor: None
             physical id: 0
             serial: None
             slot: A0
             size: 1GiB
             width: 64 bits
        *-bank:1
             description: DIMM
             product: None
             vendor: None
             physical id: 1
             serial: None
             slot: A1
             size: 256MiB
             width: 64 bits
I'm backing-up to a personalized directory on a second internal hard disk (something sbackup didn't easily allow), in gzip format, including /home, excluding files >100Mo. I've scheduled a daily incremental backup with simple cut-off at >16 days and purging activated. I can confirm that python doesn't run indefinitely but does, eventually, stop. I'd like, ideally, to run a full backup, then a series of incrementals, then another full backup, discarding everything older than the most recent full backup.
A full backup directory, in my case, only weighs about 8.3Go, by the way.
Thanks again for your time and help. Best, G.R.

Download full text (3.2 KiB)

-Processors-
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz : 1596.00MHz
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz : 1596.00MHz
-Memory-
Total Memory : 2035272 kB
last incremental backup

2010-05-28 21:50:49,315 - INFO in BackupManager.__fillSnapshot(394): Maximum free size required is '393 MiB 139 KiB 847'.
2010-05-28 21:50:49,316 - INFO in BackupManager.__fillSnapshot(395): Available disk size is '47189 MiB 780 KiB 0'.
2010-05-28 21:50:49,316 - INFO in BackupManager.__fillSnapshot(396): Number of directories: 5421.
2010-05-28 21:50:49,317 - INFO in BackupManager.__fillSnapshot(397): Number of symlinks: 4688.
2010-05-28 21:50:49,317 - INFO in BackupManager.__fillSnapshot(398): Total number of files: 50092.
2010-05-28 21:50:49,318 - INFO in BackupManager.__fillSnapshot(399): Number of files included in snapshot: 376.
2010-05-28 21:50:49,318 - INFO in BackupManager.__fillSnapshot(400): Number of new files (also included): 51.
2010-05-28 21:50:49,318 - INFO in BackupManager.__fillSnapshot(401): Number of files skipped in incremental snapshot: 49716.
2010-05-28 21:50:49,320 - INFO in BackupManager.__fillSnapshot(402): Number of items forced to be excluded: 125.
2010-05-28 21:50:49,320 - INFO in BackupManager.__fillSnapshot(403): Number of items to be excluded by config: 51.
2010-05-28 21:50:49,341 - INFO in tar.makeTarIncBackup(272): Launching TAR to make incremental backup.
2010-05-28 21:50:49,341 - DEBUG in tar.__prepareTarCommonOpts(241): ['-cS', '--bzip2', '--directory=/', '--ignore-failed-read', '--files-from=/media/backup/dailybackup/2010-05-28_21.49.36.643084.lucid.inc/includes.list.tmp', '--exclude-from=/media/backup/dailybackup/2010-05-28_21.49.36.643084.lucid.inc/excludes.list.tmp', '--file=/media/backup/dailybackup/2010-05-28_21.49.36.643084.lucid.inc/files.tar.bz2']
2010-05-28 21:50:49,341 - DEBUG in tar.__prepareTarCommonOpts(243): Common TAR options : ['-cS', '--bzip2', '--directory=/', '--ignore-failed-read', '--files-from=/media/backup/dailybackup/2010-05-28_21.49.36.643084.lucid.inc/includes.list.tmp', '--exclude-from=/media/backup/dailybackup/2010-05-28_21.49.36.643084.lucid.inc/excludes.list.tmp', '--file=/media/backup/dailybackup/2010-05-28_21.49.36.643084.lucid.inc/files.tar.bz2']
2010-05-28 21:50:49,342 - DEBUG in tar.makeTarIncBackup(285): Snapshot's base snarfile: /media/backup/dailybackup/2010-05-28_07.23.53.568803.lucid.inc/files.snar
2010-05-28 21:50:49,342 - DEBUG in tar.makeTarIncBackup(286): Snapshot's snarfile: /media/backup/dailybackup/2010-05-28_21.49.36.643084.lucid.inc/files.snar
2010-05-28 21:50:49,342 - DEBUG in tar.makeTarIncBackup(287): Temporary snarfile: /tmp/nssbackup/files.snar
2010-05-28 21:50:49,347 - DEBUG in __init__.launch(295): Lauching : ['tar', '-cS', '--bzip2', '--directory=/', '--ignore-failed-read', '--files-from=/media/backup/dailybackup/2010-05-28_21.49.36.643084.lucid.inc/includes.list.tmp', '--exclude-from=/media/backup/dailybackup/2010-05-28_21.49.36.643084.lucid.inc/excludes.list.tmp', '--file=/media/backup/dailybackup/2010-05-28_21.49.36.643084.lucid.inc/files.tar.bz2', '--listed-incremental=/tmp/nssbackup/files.snar']

files.snar : 1.7 MB (1781181 bytes)

I don't know how lo...

Read more...

Download full text (3.5 KiB)

last full backup took 45 mins with a bzip2 process using one CPU between 60 and 98% and produced a files.tar.bz2 of 5.5 GB

2010-05-29 01:54:08,838 - INFO in BackupManager.__fillSnapshot(394): Maximum free size required is '7991 MiB 167 KiB 150'.
2010-05-29 01:54:08,839 - INFO in BackupManager.__fillSnapshot(395): Available disk size is '47090 MiB 832 KiB 0'.
2010-05-29 01:54:08,839 - INFO in BackupManager.__fillSnapshot(396): Number of directories: 5422.
2010-05-29 01:54:08,839 - INFO in BackupManager.__fillSnapshot(397): Number of symlinks: 4689.
2010-05-29 01:54:08,839 - INFO in BackupManager.__fillSnapshot(398): Total number of files: 50093.
2010-05-29 01:54:08,840 - INFO in BackupManager.__fillSnapshot(399): Number of files included in snapshot: 50093.
2010-05-29 01:54:08,840 - INFO in BackupManager.__fillSnapshot(400): Number of new files (also included): 0.
2010-05-29 01:54:08,840 - INFO in BackupManager.__fillSnapshot(401): Number of files skipped in incremental snapshot: 0.
2010-05-29 01:54:08,840 - INFO in BackupManager.__fillSnapshot(402): Number of items forced to be excluded: 124.
2010-05-29 01:54:08,841 - INFO in BackupManager.__fillSnapshot(403): Number of items to be excluded by config: 51.
2010-05-29 01:54:08,847 - DEBUG in Snapshot.commitbasefile(545): WARNING: Attempt of committing base file for full snapshot '2010-05-29_01.53.15.785702.lucid.ful'.
2010-05-29 01:54:08,853 - INFO in tar.makeTarFullBackup(326): Launching TAR to make a full backup.
2010-05-29 01:54:08,853 - DEBUG in tar.__prepareTarCommonOpts(241): ['-cS', '--bzip2', '--directory=/', '--ignore-failed-read', '--files-from=/media/backup/dailybackup/2010-05-29_01.53.15.785702.lucid.ful/includes.list.tmp', '--exclude-from=/media/backup/dailybackup/2010-05-29_01.53.15.785702.lucid.ful/excludes.list.tmp', '--file=/media/backup/dailybackup/2010-05-29_01.53.15.785702.lucid.ful/files.tar.bz2']
2010-05-29 01:54:08,853 - DEBUG in tar.__prepareTarCommonOpts(243): Common TAR options : ['-cS', '--bzip2', '--directory=/', '--ignore-failed-read', '--files-from=/media/backup/dailybackup/2010-05-29_01.53.15.785702.lucid.ful/includes.list.tmp', '--exclude-from=/media/backup/dailybackup/2010-05-29_01.53.15.785702.lucid.ful/excludes.list.tmp', '--file=/media/backup/dailybackup/2010-05-29_01.53.15.785702.lucid.ful/files.tar.bz2']
2010-05-29 01:54:08,854 - DEBUG in tar.makeTarFullBackup(338): Snapshot's snarfile: /media/backup/dailybackup/2010-05-29_01.53.15.785702.lucid.ful/files.snar
2010-05-29 01:54:08,854 - DEBUG in tar.makeTarFullBackup(339): Temporary snarfile: /tmp/nssbackup/files.snar
2010-05-29 01:54:08,855 - DEBUG in __init__.launch(295): Lauching : ['tar', '-cS', '--bzip2', '--directory=/', '--ignore-failed-read', '--files-from=/media/backup/dailybackup/2010-05-29_01.53.15.785702.lucid.ful/includes.list.tmp', '--exclude-from=/media/backup/dailybackup/2010-05-29_01.53.15.785702.lucid.ful/excludes.list.tmp', '--file=/media/backup/dailybackup/2010-05-29_01.53.15.785702.lucid.ful/files.tar.bz2', '--listed-incremental=/tmp/nssbackup/files.snar']
2010-05-29 02:42:14,935 - DEBUG in tar.makeTarFullBackup(350): TAR Output :
2010-05-29 02:42:14,942 - INFO in tar.__finish_backup(36...

Read more...

Jean-Peer Lorenz (peer.loz) wrote :

This bug won't fix. nssbackup is no longer maintained. Please use sbackackup 0.11 instead. Thanks.

Changed in nssbackup:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers