"OperationalError: database is locked" encountered under very light load while using REST API

Bug #856004 reported by Stephen A. Goss
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Mailman
Incomplete
Undecided
Unassigned

Bug Description

I've seen this error twice while running integration tests (of our own application) that access Mailman 3 using the API. The load on Mailman is very light, as all requests made to the API are serial, never in parallel. Of course the various processing queues in MM3 are separate processes and could be doing work concurrently.

The platform is Centos 6.

Traceback (most recent call last):
  File "/usr/lib64/python2.6/wsgiref/handlers.py", line 93, in run
    self.result = application(self.environ, self.start_response)
  File "/home/sgoss/mailman_clone/src/mailman/rest/wsgiapp.py", line 63, in __call__
    config.db.commit()
  File "/home/sgoss/mailman_clone/src/mailman/database/stock.py", line 70, in commit
    self.store.commit()
  File "/home/sgoss/phoenix-deploy/lib/python2.6/site-packages/storm-0.18-py2.6-linux-x86_64.egg/storm/store.py", line 122, in commit
    self._connection.commit()
  File "/home/sgoss/phoenix-deploy/lib/python2.6/site-packages/storm-0.18-py2.6-linux-x86_64.egg/storm/databases/sqlite.py", line 126, in commit
    self.raw_execute("COMMIT", _end=True)
  File "/home/sgoss/phoenix-deploy/lib/python2.6/site-packages/storm-0.18-py2.6-linux-x86_64.egg/storm/databases/sqlite.py", line 154, in raw_execute
    return Connection.raw_execute(self, statement, params)
  File "/home/sgoss/phoenix-deploy/lib/python2.6/site-packages/storm-0.18-py2.6-linux-x86_64.egg/storm/database.py", line 321, in raw_execute
    self._check_disconnect(raw_cursor.execute, *args)
  File "/home/sgoss/phoenix-deploy/lib/python2.6/site-packages/storm-0.18-py2.6-linux-x86_64.egg/storm/database.py", line 366, in _check_disconnect
    return function(*args, **kwargs)
OperationalError: database is locked

The backend here is SQLite, which falls over very quickly under concurrent load. If the goal here is to create a stable mailing list server, maybe SQLite is a poor default.

Tags: mailman3
Revision history for this message
Barry Warsaw (barry) wrote : Re: [Bug 856004] [NEW] "OperationalError: database is locked" encountered under very light load while using REST API

On Sep 22, 2011, at 12:17 AM, Stephen A. Goss wrote:

>I've seen this error twice while running integration tests (of our own
>application) that access Mailman 3 using the API. The load on Mailman is
>very light, as all requests made to the API are serial, never in
>parallel. Of course the various processing queues in MM3 are separate
>processes and could be doing work concurrently.

This probably means that one of the other processes is holding onto the
database lock longer than it should, e.g. by not committing or aborting the
transaction. I believe Python's sqlite layer should retry the transaction for
a few seconds when it finds the database is locked. We'll have to hunt down
which other process is holding the database lock (and add some debugging
instrumentation if necessary). Fortunately, we don't have threads to worry
about!

>The backend here is SQLite, which falls over very quickly under
>concurrent load. If the goal here is to create a stable mailing list
>server, maybe SQLite is a poor default.

It could be that sqlite is only really useful for the test suite. However,
Storm (our ORM layer) supports MySQL and PostgreSQL. Have you tried either of
those databases?

Revision history for this message
Stephen A. Goss (postfuturist) wrote :

I'm working on Postgres support. It certainly doesn't work yet, mostly because the database creation is very SQLite specific, both the schema definitions which are in raw SQLite-flavored SQL, and some supporting code. I'm currently creating a mailman_pg.sql file to be used if a postgres database URL is given. There's probably a better way to do it, and I'd hazard to guess that would have been using a different ORM.

Revision history for this message
Barry Warsaw (barry) wrote : Re: [Bug 856004] Re: "OperationalError: database is locked" encountered under very light load while using REST API

On Sep 23, 2011, at 12:20 AM, Stephen A. Goss wrote:

>I'm working on Postgres support. It certainly doesn't work yet, mostly
>because the database creation is very SQLite specific, both the schema
>definitions which are in raw SQLite-flavored SQL, and some supporting
>code. I'm currently creating a mailman_pg.sql file to be used if a
>postgres database URL is given. There's probably a better way to do it,
>and I'd hazard to guess that would have been using a different ORM.

I know these things are notoriously difficult to reproduce, but is there any
chance you have a way to reproduce the problem?

I'll definitely be very interested to see your branch. I think supporting pg
out of the box would be a great benefit.

Revision history for this message
Stephen A. Goss (postfuturist) wrote :

I haven't found a way to reliably reproduce the error. It's only happened twice, and repeatedly running the tests that triggered it haven't reproduced it, yet. Unfortunately, those tests are integration tests with a proprietary system and I can't share the code (nor would it be very useful). It happens after many operations including creating mailing lists, sending messages through, moderating messages, etc.

Revision history for this message
Stephen A. Goss (postfuturist) wrote :

The way I run the tests that caused this error is that a temporary folder is created to hold the config and all the files for mailman, including log files, etc. There's a new folder each time I run the tests. If we added some logging to help catch the problem, those logs would be available the next time the problem occurred.

Revision history for this message
Stephen A. Goss (postfuturist) wrote :

Ah, this is likely my fault and not an issue on the main mailman branch. On our own branch we have code that will ping a callback HTML endpoint to query our system upon incoming mail to a mailing list so we can rely on our own code's rules for moderation, acceptance, rejection. This internal delay for the HTTP request is almost certainly the culprit here. This bug should probably be closed, though it is important to keep in mind that it is not impossible for database locks to be held for longer than normal under heavy load and that SQLite does not do well in these conditions.

Revision history for this message
Barry Warsaw (barry) wrote :

Cool, thanks for the follow up. I'll mark it incomplete, but your absolutely right that we have to be careful about this.

Changed in mailman:
status: New → Incomplete
Revision history for this message
Abhilash Raj (raj-abhilash1) wrote :

This bug has been moved to the new gitlab repo here: https://gitlab.com/mailman/mailman/issues/54

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers