PostgreSQL database deadlocks under high load
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Designate |
New
|
Undecided
|
Unassigned |
Bug Description
We are using standard nova and neutron notifications handlers for records create. We have three controllers with the following packages:
[root@srv-os-ctl01 ~]# rpm -qa | grep designate
openstack-
openstack-
python-
openstack-
openstack-
openstack-
python-
openstack-
openstack-
openstack-
Memcached is used as a coordination backend.
Under high load we get the following errors in central.log:
WARNING designate.storage [req-75f6b550-
And on the postgresql side:
< 2018-08-02 16:23:04.286 MSK > ERROR: deadlock detected
< 2018-08-02 16:23:04.286 MSK > DETAIL: Process 40320 waits for AccessExclusiveLock on tuple (141,10) of relation 17034 of database 16396; blocked by process 40159.
Process 40159 waits for ShareLock on transaction 109671456; blocked by process 40102.
Process 40102 waits for ShareLock on transaction 109671770; blocked by process 40320.
Process 40320: UPDATE records SET version=
Process 40159: UPDATE records SET version=
Process 40102: UPDATE zones SET version=
< 2018-08-02 16:23:04.286 MSK > HINT: See server log for query details.
< 2018-08-02 16:23:04.286 MSK > STATEMENT: UPDATE records SET version=
As a workaround we have moved central and mdns services to pacemaker and are using only one instances of each currently.
Reviewed: https:/ /review. openstack. org/647711 /git.openstack. org/cgit/ openstack/ designate/ commit/ ?id=f828654a3d4 0476cac7eb24a09 a36e9978c2d708
Committed: https:/
Submitter: Zuul
Branch: master
commit f828654a3d40476 cac7eb24a09a36e 9978c2d708
Author: Takahito Hirose <email address hidden>
Date: Tue Mar 26 19:52:33 2019 +0900
Fix DBDeadLock error resulting into 500
When user requests the record registration request continuously, rror.
sometimes designate hits DBDeadLock resuting into 500 InternalServerE
We get below error:
2019-02-21 21:30:39.925 49752 ERROR designate. api.middleware RemoteError: err.InternalErr or) (records. version + %(version_1)s), at=%(updated_ at)s, data=%(data)s, hash=%(hash)s, status=%(status)s, %(action) s, serial=%(serial)s WHERE records.id = %(id_1)s'] 9ad1c0190c6a3d8 d4f', datetime( 2019, 2, 21, 12, 30, 39, 909846), u'version_1': 1, cdaa81caf19ab55 fcc', 'action': 'UPDATE',
Remote error: DBDeadlock (pymysql.
(1213, u'Deadlock found when trying to get lock; try restarting transaction')
[SQL: u'UPDATE records SET version=
updated_
action=
[parameters: {'status': 'PENDING', 'hash': '39795ee18c6e3c
'updated_at': datetime.
u'id_1': '7a655eeda4d446
'serial': 1550752338,
'data': u'ns2.example.jp. domain.example.com. 1550752338 3552 600 86400 3600'}]
In the process of record registeration, designate first tried to update
the reocrd and then update the zone status.
Updating the zone_status and registering the record process[1] and after synced
update record_status and zone_status process[2] are in reverse order. So If user
request the registering record many time and same time, Designate will get the
DBDeadLock, when these processes run the same time.
We observed that changing the order of the operations solves this issue.
[1] https:/ /github. com/openstack/ designate/ blob/master/ designate/ central/ service. py#L1292- L1320 /github. com/openstack/ designate/ blob/master/ designate/ central/ service. py#L2310- L2322
[2] https:/
1. transaction [1]-1 updating zone status process <- run ---> table_name-zone
2. transaction [2]-1 updating record status process <- run ---> table_name-record
3. transaction [1]-2 registering record process <- run and wait ---> table_name-record
4. transaction [2]-2 updating zone process <-deadlock! ---> table_name-zone
Change-Id: Icd6e690ac84a2f e0db0f4a8a513de 47f7916f5ea
Related-Bug: #1785459