Frequent OperationalError: database is locked with sqlite

Bug #1091586 reported by Aurélien Bompard
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
GNU Mailman
Undecided
Unassigned

Bug Description

I'm seeing frequent locking errors with SQLite on the current version of mailman3 from bzr. Here's an example traceback :

Traceback (most recent call last):
  File "/srv/dev/mailman/bin/runner", line 9, in <module>
    load_entry_point('mailman==3.0.0b2-', 'console_scripts', 'runner')()
  File "/srv/dev/mailman/mailman3/mailman/src/mailman/bin/runner.py", line 253, in main
    runner.run()
  File "/srv/dev/mailman/mailman3/mailman/src/mailman/core/runner.py", line 96, in run
    filecnt = self._one_iteration()
  File "/srv/dev/mailman/mailman3/mailman/src/mailman/core/runner.py", line 173, in _one_iteration
    config.db.commit()
  File "/srv/dev/mailman/mailman3/mailman/src/mailman/database/base.py", line 77, in commit
    self.store.commit()
  File "/srv/dev/mailman/lib/python2.6/site-packages/storm-0.19-py2.6-linux-x86_64.egg/storm/store.py", line 122, in commit
    self._connection.commit()
  File "/srv/dev/mailman/lib/python2.6/site-packages/storm-0.19-py2.6-linux-x86_64.egg/storm/databases/sqlite.py", line 126, in commit
    self.raw_execute("COMMIT", _end=True)
  File "/srv/dev/mailman/lib/python2.6/site-packages/storm-0.19-py2.6-linux-x86_64.egg/storm/databases/sqlite.py", line 154, in raw_execute
    return Connection.raw_execute(self, statement, params)
  File "/srv/dev/mailman/lib/python2.6/site-packages/storm-0.19-py2.6-linux-x86_64.egg/storm/database.py", line 322, in raw_execute
    self._check_disconnect(raw_cursor.execute, *args)
  File "/srv/dev/mailman/lib/python2.6/site-packages/storm-0.19-py2.6-linux-x86_64.egg/storm/database.py", line 371, in _check_disconnect
    return function(*args, **kwargs)
sqlite3.OperationalError: database is locked

It happens during the normal processing of Mailman, not while using the API (to my knowledge), so it seems different from bug #856004.

I can reproduce this easily by letting Mailman work for some time. After the locking error, new messages are stuck in var/queue/pipeline:

$ ls var/queue/*
var/queue/archive:

var/queue/bad:

var/queue/bounces:

var/queue/command:

var/queue/digest:

var/queue/in:

var/queue/lmtp:

var/queue/nntp:

var/queue/out:

var/queue/pipeline:
1355781960.9870529+18f0acdf07aab616f793ecf76acfa1994e62b3a9.pck 1355799347.787621+213f3a6a971717ad1ba27f3d56b65767747a45da.pck 1355814973.8623891+4e66eeadef9b8c9ae99e4ed022c6c93f5af75c05.pck
1355782349.546113+ee14cb323574b647d9014f742e3f505e388d553e.pck 1355800543.460628+c455625c1ad86a5a77c9c515fcb6d555ba439b19.pck 1355815808.8860879+ca3fb5f45999e47c92756489daa951672ba4304d.pck
1355792739.8787439+19973fbcf763f036655aa0a7d477231246afdfbd.pck 1355802301.7830889+7989a3199c7eb491b65176937c0c1658cca9d46d.pck 1355820018.1813829+5005ded04da3574a7f3cd7472c9d635c577b6c32.pck
1355794764.597765+6cf379ad16c948117bb65f0aed0b3c230268ffd2.pck 1355803681.7455051+52d185cdcaecaf883bd43f375f5253567f5cd568.pck 1355822188.0491021+6a3cab9831d03b4f94475f745fa508c5126116b4.pck
1355794889.8323901+e14e25e342ec59709266040d830d7e12be2a1261.pck 1355805362.052228+422e573a6117edb7aae1acf9150ac58c210cedba.pck 1355822293.2363911+627f13dd2121bcb0f1916bc0448cc3b8adc9ad1f.pck
1355795937.2442579+62193d05ee80c6a22213cf2ff211c135fb7d38d9.pck 1355807374.9844501+8de57ab53819fac59d64705bcc56f86899d22634.pck
1355798662.8335559+554da4e7690e116e6fabee5764ac69829dc25685.pck 1355808273.143533+ed1acbf580c596ed921e062ece575877f0c9304b.pck

var/queue/rest:

var/queue/retry:

var/queue/shunt:
1355776302.1302171+c657867b33bf6e37554ff83ee26f4096a77c1114.pck

var/queue/virgin:

And I need to restart mailman to make it process them again. The locked caused the message to be shunted.

I'll attach the mailman.log file, where you'll see many "database is locked" errors, happening on different requests.

Revision history for this message
Aurélien Bompard (abompard) wrote :
Revision history for this message
Aurélien Bompard (abompard) wrote :
Download full text (3.9 KiB)

I just restarted Mailman and got two tracebacks at exactly the same time, from two different processes (the PID number is logged too). Could Mailman be unqueuing from the pipeline queue in parallel processes ? Could that cause the SQLite lock ?

For reference, the two tracebacks in the log :

Dec 18 05:05:38 2012 (31146) Uncaught runner archive exception: database is locked
Dec 18 05:05:38 2012 (31146) Traceback (most recent call last):
  File "/srv/dev/mailman/mailman3/mailman/src/mailman/core/runner.py", line 138, in _one_iteration
    self._process_one_file(msg, msgdata)
  File "/srv/dev/mailman/mailman3/mailman/src/mailman/core/runner.py", line 194, in _process_one_file
    else getUtility(IListManager).get(unicode(listname)))
  File "/srv/dev/mailman/mailman3/mailman/src/mailman/database/transaction.py", line 78, in wrapper
    return function(args[0], config.db.store, *args[1:], **kws)
  File "/srv/dev/mailman/mailman3/mailman/src/mailman/model/listmanager.py", line 69, in get
    return store.find(MailingList, MailingList._list_id == list_id).one()
  File "/srv/dev/mailman/lib/python2.6/site-packages/storm-0.19-py2.6-linux-x86_64.egg/storm/store.py", line 1142, in one
    result = self._store._connection.execute(select)
  File "/srv/dev/mailman/lib/python2.6/site-packages/storm-0.19-py2.6-linux-x86_64.egg/storm/database.py", line 238, in execute
    raw_cursor = self.raw_execute(statement, params)
  File "/srv/dev/mailman/lib/python2.6/site-packages/storm-0.19-py2.6-linux-x86_64.egg/storm/databases/sqlite.py", line 154, in raw_execute
    return Connection.raw_execute(self, statement, params)
  File "/srv/dev/mailman/lib/python2.6/site-packages/storm-0.19-py2.6-linux-x86_64.egg/storm/database.py", line 322, in raw_execute
    self._check_disconnect(raw_cursor.execute, *args)
  File "/srv/dev/mailman/lib/python2.6/site-packages/storm-0.19-py2.6-linux-x86_64.egg/storm/database.py", line 371, in _check_disconnect
    return function(*args, **kwargs)
OperationalError: database is locked
Dec 18 05:05:38 2012 (31146) SHUNTING: 1355828738.0138969+0a282d3d40cc1528e1d41b992b070b492449a641
Dec 18 11:05:38 2012 (31144) Uncaught runner digest exception: database is locked
Dec 18 11:05:38 2012 (31144) Traceback (most recent call last):
  File "/srv/dev/mailman/mailman3/mailman/src/mailman/core/runner.py", line 138, in _one_iteration
    self._process_one_file(msg, msgdata)
  File "/srv/dev/mailman/mailman3/mailman/src/mailman/core/runner.py", line 222, in _process_one_file
    keepqueued = self._dispose(mlist, msg, msgdata)
  File "/srv/dev/mailman/mailman3/mailman/src/mailman/runners/digest.py", line 373, in _dispose
    for address, delivery_mode in mlist.last_digest_recipients:
  File "/srv/dev/mailman/mailman3/mailman/src/mailman/model/mailinglist.py", line 347, in last_digest_recipients
    results.remove()
  File "/srv/dev/mailman/lib/python2.6/site-packages/storm-0.19-py2.6-linux-x86_64.egg/storm/store.py", line 1183, in remove
    Delete(self._where, self._find_spec.default_cls_info.table))
  File "/srv/dev/mailman/lib/python2.6/site-packages/storm-0.19-py2.6-linux-x86_64.egg/storm/database.py", line 238, in execute
    raw_cursor = self.ra...

Read more...

Barry Warsaw (barry)
tags: added: mailman3
Revision history for this message
Barry Warsaw (barry) wrote :

A couple of thoughts:

- Could you try again with the latest bzr head? Or perhaps with 3.0b3 which I'll be releasing soon? You'll have to switch to Python 2.7 though.

- I'm seeing lots of errors in HyperKitty before the database locked error. I wonder if one of those errors is causing a transaction to stay open too long? If one process gets the error and doesn't abort the current transaction, it'll cause this database lock error in another process. Because of the way Storm is used, just about anything, including reading data, will open a transaction (you see this often in REST test cases).

- Perhaps you can disable HyperKitty and run your tests again?

Revision history for this message
Aurélien Bompard (abompard) wrote :

Hi Barry, sorry for not replying earlier, I was offline for the holidays.

The hyperkitty-related errors are normal, since I'm working on it, but I did try without enabling it and I got the lockups nonetheless.

I upgraded to the latest HEAD (I was on rev 7189), I'm now on rev 7199, and I'm still having the lockups. I'll attach a new log file right away.

Revision history for this message
Aurélien Bompard (abompard) wrote :
Revision history for this message
Aurélien Bompard (abompard) wrote :

I'm still seeing a lot of those. Actually, by looking at the code, I can't find where such concurrent accesses are being prevented. Is it at all ? If not, that would explain the lock errors.

Revision history for this message
Aurélien Bompard (abompard) wrote :

This bug needs to be reproduced now that Storm has been replaced by SQLAlchemy.

Changed in mailman:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers