Activity log for bug #1392762

Date Who What changed Old value New value Message
2014-11-14 15:22:50 Kiall Mac Innes bug added bug
2014-11-14 15:25:14 Kiall Mac Innes description Concurrent requests to designate-central can, under certain circumstances, cause it to lock up. If two requests to, for example, add records to a zone are received approximately simultaneously, we can end up with a code deadlock (i.e. not a true DB deadlock) . Consider the following example: 1) Two API calls to add a record to a single zone come in 2) Request 1 ("R1") is received by Central, a DB TX is opened, and work begins causing a DB lock to be obtained. 3) Eventlet performs a context switch, allowing R2 to begin. 4) Request 2 ("R2") is received by Central, a DB TX is opened, and work begins, the DB query blocks as R1 holds the requisite locks. 5) Neither R1 nor R2 can complete, as MySQL-Python is C based, so eventlet is unable to make the "blocking" query asynchronous. 6) After 30 seconds or so, at least 1 of the 2 open TX's will be aborted by MySQL due to a timeout obtaining the requisite locks. Using a pure python MySQL driver (e.g. PyMySQL) will prevent this issue, as eventlet is capable of monkey patching the driver. The downside is, it's a slow pure-python implementation rather than a C implementation like MySQL-Python. I believe the correct solution is to "tighten up" our DB TX window, avoiding any code that may cause a context switch during the TX window. This has the added advantage of having a much smaller transaction window than we currently do. Concurrent requests to designate-central can, under certain circumstances, cause it to lock up. In most current production deployments, having X designate-central instances, each with Y workers results in this issue being unlikely to occur when less than N*Y concurrent API calls are processed for a single zone simultaneously. If two requests to, for example, add records to a zone are received approximately simultaneously, we can end up with a code deadlock (i.e. not a true DB deadlock) . Consider the following example: 1) Two API calls to add a record to a single zone come in 2) Request 1 ("R1") is received by Central, a DB TX is opened, and work begins causing a DB lock to be obtained. 3) Eventlet performs a context switch, allowing R2 to begin. 4) Request 2 ("R2") is received by Central, a DB TX is opened, and work begins, the DB query blocks as R1 holds the requisite locks. 5) Neither R1 nor R2 can complete, as MySQL-Python is C based, so eventlet is unable to make the "blocking" query asynchronous. 6) After 30 seconds or so, at least 1 of the 2 open TX's will be aborted by MySQL due to a timeout obtaining the requisite locks. Using a pure python MySQL driver (e.g. PyMySQL) will prevent this issue, as eventlet is capable of monkey patching the driver. The downside is, it's a slow pure-python implementation rather than a C implementation like MySQL-Python. I believe the correct solution is to "tighten up" our DB TX window, avoiding any code that may cause a context switch during the TX window. This has the added advantage of having a much smaller transaction window than we currently do.
2014-11-14 15:26:38 Kiall Mac Innes bug added subscriber Erik Andersson
2014-11-14 15:26:41 Kiall Mac Innes removed subscriber Erik Andersson
2014-11-14 15:26:58 Kiall Mac Innes bug added subscriber Erik Andersson
2014-11-15 14:07:48 OpenStack Infra designate: status Triaged In Progress
2014-11-17 19:52:29 OpenStack Infra designate: status In Progress Fix Committed
2014-11-19 14:21:52 Kiall Mac Innes designate: status Fix Committed Won't Fix
2014-11-19 14:21:56 Kiall Mac Innes designate: status Won't Fix In Progress
2014-12-03 14:09:16 OpenStack Infra designate: assignee Kiall Mac Innes (kiall) Ron Rickard (rjrjr)
2014-12-15 15:43:10 OpenStack Infra designate: assignee Ron Rickard (rjrjr) Kiall Mac Innes (kiall)
2014-12-16 00:20:57 OpenStack Infra designate: status In Progress Fix Committed
2014-12-18 12:36:14 Thierry Carrez designate: status Fix Committed Fix Released
2014-12-18 12:48:43 Kiall Mac Innes nominated for series designate/juno
2014-12-18 12:48:43 Kiall Mac Innes bug task added designate/juno
2014-12-18 12:48:53 Kiall Mac Innes designate/juno: status New Triaged
2014-12-18 12:48:56 Kiall Mac Innes designate/juno: importance Undecided High
2014-12-18 12:49:04 Kiall Mac Innes designate/juno: milestone 2013.2.3
2015-04-30 12:34:34 Thierry Carrez designate: milestone kilo-1 2015.1.0