Comment 0 for bug 1392762

Revision history for this message
Kiall Mac Innes (kiall) wrote :

Concurrent requests to designate-central can, under certain circumstances, cause it to lock up.

If two requests to, for example, add records to a zone are received approximately simultaneously, we can end up with a code deadlock (i.e. not a true DB deadlock) . Consider the following example:

1) Two API calls to add a record to a single zone come in
2) Request 1 ("R1") is received by Central, a DB TX is opened, and work begins causing a DB lock to be obtained.
3) Eventlet performs a context switch, allowing R2 to begin.
4) Request 2 ("R2") is received by Central, a DB TX is opened, and work begins, the DB query blocks as R1 holds the requisite locks.
5) Neither R1 nor R2 can complete, as MySQL-Python is C based, so eventlet is unable to make the "blocking" query asynchronous.
6) After 30 seconds or so, at least 1 of the 2 open TX's will be aborted by MySQL due to a timeout obtaining the requisite locks.

Using a pure python MySQL driver (e.g. PyMySQL) will prevent this issue, as eventlet is capable of monkey patching the driver. The downside is, it's a slow pure-python implementation rather than a C implementation like MySQL-Python.

I believe the correct solution is to "tighten up" our DB TX window, avoiding any code that may cause a context switch during the TX window. This has the added advantage of having a much smaller transaction window than we currently do.