sb_filter.py does not lock the database file resulting in corruption

Bug #30720 reported by Marius Gedminas
8
Affects Status Importance Assigned to Milestone
spambayes (Debian)
Fix Released
Unknown
spambayes (Ubuntu)
Confirmed
Low
Unassigned

Bug Description

When you use sb_filter.py -S (or -N) to train SpamBayes, it writes to .hammiedb without first locking the file to ensure that concurrent access does not corrupt it. As a result, if you (accidentally or intentionally) run several instances of sb_filter.py in parallel, the database may be corrupted and become unusable.

(I read mail on my laptop, but spambayes is run from procmail on a server. In order to train spambayes on misclassified emails I have to pipe messages to sb_filter.py through SSH. Since the SSH connection takes a while, and does not require interactivity, I tend to put it in background. As a result, several SSH connections and several sb_filter.py processes may run in parallel, when there are several uncaught spams in my inbox. I've experiences .hammiedb corruption twice already.)

Revision history for this message
Marius Gedminas (mgedmin) wrote :

(I meant to say sb_filter.py -s/-g, not -S/-N. I've shell scripts that remember the correct option names for me.)

Revision history for this message
Alexandre Otto Strube (surak) wrote :

Hello Marlus, is this still happening with latest dapper updates? Thanks for the bug!

Revision history for this message
Toby Dickenson (toby-tarind) wrote :

(I am an occasional spambayes developer)

This behaviour is by design. future versions are likely to offer other storage options which do not have this problem (possibly based on zodb).

Short-term solutions include:

a. Configure procmail to use a lock file, and use a wrapper script around sb_filter.py for training which also uses the 'lockfile' utility to serialise access to your database.

b. Replace all uses of sb_filter.py with sb_bnfilter.py. This behaves (almost) identically, but all operations are automatically dispatched through a shared, automatically managed daemon process. The primary purpose of this is to reduce startup overhead, but it will also eliminate the concurrency which causes this corruption.

I hope this helps,

Revision history for this message
Carlos Diener / emonkey (emonkey) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. You reported this bug a while ago and there hasn't been any activity in it recently. We were wondering is this still an issue for you? Can you try with latest Ubuntu release? Thanks in advance.

Revision history for this message
Daniel T Chen (crimsun) wrote :
Changed in spambayes:
importance: Medium → Low
status: New → Confirmed
Changed in spambayes (Debian):
status: Unknown → Confirmed
Changed in spambayes (Debian):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.