DB deadlock when delete subnet

Bug #1793516 reported by Keyang Li
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Triaged
Undecided
Unassigned

Bug Description

my env is stable/queens

when delete a network/sunbet,neutron delete subnet first and then delete ipamsubnet.
the code here: https://github.com/openstack/neutron/blob/stable/queens/neutron/db/db_base_plugin_v2.py#L1029

the DB operate is just like:
DELETE FROM subnets WHERE subnets.id = '5d9bb112-622a-4b56-b5b9-f6c0fa6dbe7b'; step 1
DELETE FROM ipamsubnets WHERE ipamsubnets.id = '9674ebea-992a-4d78-b234-a3f6f640ad8a'; step 3 wait 2

and if there is a alloc ip action in this time,the code below:
https://github.com/openstack/neutron/blob/stable/queens/neutron/db/ipam_pluggable_backend.py#L162

the DB operate is just like:
INSERT INTO ipamallocations (ip_address, status, ipam_subnet_id) VALUES ('192.168.1.3', 'ALLOCATED', '9674ebea-992a-4d78-b234-a3f6f640ad8a'); step 2
INSERT INTO ipallocations (port_id, ip_address, ip_address_integer, subnet_id, network_id) VALUES ('6a704ca2-c258-4b15-8bc2-8b434e2689d6', '192.168.1.3', 0, '5d9bb112-622a-4b56-b5b9-f6c0fa6dbe7b', 'a3c0ad20-661d-4b9b-beef-10adabcacf06'); step 4 wait 1

the table ipamallocations has Foreign Key ipamsubnetid
the table ipallocations has Foreign Key subnetid

so,step 4 wait 1 to release shared lock, step 3 wait 2 to release shared lock, deadlock.
how to solve this issue?

Revision history for this message
Keyang Li (ilkykli) wrote :

I switch the order ,delete ipamsubnet first, delete subnet later.it seems to solve the issue.
but there's another deadlock between standardattributes and ml2_port_bindings when delete network.

Revision history for this message
Boden R (boden) wrote :

Is this issue easy enough to reproduce?
If so can you please provide the steps used to create the error?

Revision history for this message
Keyang Li (ilkykli) wrote :

it is easy to reproduce.

python -m testtools.run tempest.api.network.test_networks
or
for i in {1..500};do python -m testtools.run tempest.api.network.test_networks.NetworksTest.test_create_delete_subnet_with_gw_and_allocation_pools;done

run the tempest test can create the error. my tempest version is 13.0.0.
every 100 times may occur 2

Revision history for this message
Boden R (boden) wrote :
Download full text (4.2 KiB)

I have not been able to reproduce this with devstack master/queens using the suggestions in comment #3. I've tried both approaches multiple times, but can't find any deadlock errors.

I've been looking in the q-svc log for such errors; should I be looking elsewhere or is there something I'm missing?

What I do find is the following error in q-svc log:

--------
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters [None req-760dc216-6348-4ff7-8d47-fa59b3fba3b5 None None] DBAPIError exception wrapped from (pymysql.err.InternalError) (1305, u'SAVEPOINT sa_savepoint_11 does not exist') [SQL: u'ROLLBACK TO SAVEPOINT sa_savepoint_11'] (Background on this error at: http://sqlalche.me/e/2j85): InternalError: (1305, u'SAVEPOINT sa_savepoint_11 does not exist')
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last):
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters context)
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 507, in do_execute
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters cursor.execute(statement, parameters)
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/cursors.py", line 165, in execute
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters result = self._query(query)
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/cursors.py", line 321, in _query
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters conn.query(q)
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 860, in query
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters self._affected_rows = self._read_query_result(unbuffered=unbuffered)
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 1061, in _read_query_result
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters result.read()
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 1349, in read
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters first_packet = self.connection._read_packet()
Sep 20 10:58:38 udevstack neutron-server[77994]: ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", lin...

Read more...

Revision history for this message
Keyang Li (ilkykli) wrote :

check the DB logs, to see whether there are some DB deadlock info.and check the conf of DB about deadlock.

my env DB is postgresql,and the conf set 'deadlock_timeout = 60s',so after 60 seconds,DB rollback and neutron-server would retry and success.

-------------------------
reproduce deadlock and after 60s in the neutron-server log:
---------
the delete network transaction would got retriable exception and retry:
---------
2018-09-20T23:11:44.103+08:00 localhost neutron-server DEBUG [pid:57132] [MainThread][tid:142650736] [req-98e37f03-9899-4cce-b259-7f9cc2d46786] [api.py:126 wrapped] Retry wrapper got retriable exception:
2018-09-20T23:11:44.103+08:00 localhost neutron-server DEBUG [pid:57132] [MainThread][tid:142650736] [req-98e37f03-9899-4cce-b259-7f9cc2d46786] [api.py:149 wrapper] Performing DB retry for function neutron.db.db_base_plugin_v2.delete_network

---------
the alloc ip(create port) transaction would got retriable exception and retry:
---------
2018-09-20T23:10:29.561+08:00 localhost neutron-server DEBUG [pid:57140] [MainThread][tid:130092400] [req-ca4115a9-6353-4d25-96fa-bdd174a7b651] [api.py:126 wrapped] Retry wrapper got retriable exception: (psycopg2.extensions.TransactionRollbackError) deadlock detected
2018-09-20T23:10:29.562+08:00 localhost neutron-server DEBUG [pid:57140] [MainThread][tid:130092400] [req-ca4115a9-6353-4d25-96fa-bdd174a7b651] [api.py:149 wrapper] Performing DB retry for function neutron.plugins.ml2.plugin.create_port

the DB log:
-----------
[2018-09-20 22:41:55 CST] OPENSTACK NEUTRON 172.28.0.2LOG: the system is trying to lock transaction 856126 (postgres.c:7630)
[2018-09-20 22:41:55 CST] OPENSTACK NEUTRON 172.28.0.2STATEMENT: DELETE FROM ipamsubnets WHERE ipamsubnets.id = '271bf515-d802-4fa0-b58b-9a14b0e5bcce'
[2018-09-20 22:41:55 CST] OPENSTACK NEUTRON 172.28.0.2ERROR: deadlock detected (deadlock.c:956)
[2018-09-20 22:41:55 CST] OPENSTACK NEUTRON 172.28.0.2DETAIL: Process 13387 waits for ShareLock on transaction 856126; blocked by process 12065.
        Process 12065 waits for ShareLock on transaction 856127; blocked by process 13387.
        Process 13387: DELETE FROM ipamsubnets WHERE ipamsubnets.id = '271bf515-d802-4fa0-b58b-9a14b0e5bcce'
        Process 12065: INSERT INTO ipallocations (port_id, ip_address, ip_address_integer, subnet_id, network_id) VALUES ('e15dd93f-b596-434f-897e-c06be51cad7b', '2003::3', 0, 'd33f5aef-267a-46c8-bd7e-8924f7cb7dea', 'f4583e12-140e-49ca-bf26-cbe41ac6d8f4')
[2018-09-20 22:41:55 CST] OPENSTACK NEUTRON 172.28.0.2HINT: See server log for query details.
[2018-09-20 22:41:55 CST] OPENSTACK NEUTRON 172.28.0.2STATEMENT: DELETE FROM ipamsubnets WHERE ipamsubnets.id = '271bf515-d802-4fa0-b58b-9a14b0e5bcce'
[2018-09-20 22:41:55 CST] OPENSTACK NEUTRON 172.28.0.2LOG:

Revision history for this message
Keyang Li (ilkykli) wrote :

cat create_delete_network.sh
--------------------------------
#!/usr/bin/bash

for ((i=1; i<=1000; i ++))
do
    neutron net-create test_$i > /dev/null
    neutron subnet-create test_$i 192.168.1.0/24 > /dev/null
    start_time=$(date +%s)
    neutron net-delete test_$i > /dev/null
    end_time=$(date +%s)
    echo $(($end_time - $start_time))
done

just create delete network would reproduce the deadlock, normal create delete network cost 5 seconds,when deadlock it would cost 60+s to timeout and retry in my env.

Revision history for this message
Keyang Li (ilkykli) wrote :

hey,bonden

In fact,you encounter the deadlock too.
have look at this bug:
https://bugs.launchpad.net/neutron/+bug/1590298

in mysql case,when deadlock it raise the error of 1305 which in your log.

Revision history for this message
Boden R (boden) wrote :

As already mentioned in comment #7, this is reproducible with queens.

I haven't been able to reproduce on master though, even after multiple attempts.
Is this consistent with what you are seeing (no deadlock when using master)?

Changed in neutron:
status: New → Triaged
Revision history for this message
Boden R (boden) wrote :

I've been trying to reproduce on stable/rocky, but am running into other issues.

Revision history for this message
Brian Haley (brian-haley) wrote :

Keyang - it has been a number of releases since Queens where Boden was able to reproduce this. If you are now running a later version have you been able to reproduce it there?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.