DB deadlock on worker create

Bug #1789106 reported by Yakir Levi
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Lucian Petrut

Bug Description

Hey
We use cinder version ocata, and I encountered the following error in cinder-volume on volume creation request:

2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server [req-5b4cb125-0d2d-4cbc-bf15-6ddd445205b8 - 0 - - -] Exception during message handling
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 222, in dispatch
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 192, in _do_dispatch
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "<decorator-gen-237>", line 2, in create_volume
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/objects/cleanable.py", line 204, in wrapper
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server cleanable.set_worker()
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/objects/cleanable.py", line 144, in set_worker
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server service_id=service_id)
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/db/api.py", line 1681, in worker_create
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server return IMPL.worker_create(context, **values)
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/db/sqlalchemy/api.py", line 6664, in worker_create
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server worker.save(session)
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 553, in __exit__
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server self.rollback()
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server compat.reraise(exc_type, exc_value, exc_tb)
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 550, in __exit__
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server self.commit()
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 459, in commit
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server t[1].commit()
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1574, in commit
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server self._do_commit()
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1605, in _do_commit
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server self.connection._commit_impl()
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 690, in _commit_impl
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server self._handle_dbapi_exception(e, None, None, None, None)
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1337, in _handle_dbapi_exception
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server util.raise_from_cause(newraise, exc_info)
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server reraise(type(exception), exception, tb=exc_tb, cause=cause)
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 688, in _commit_impl
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server self.engine.dialect.do_commit(self.connection)
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 2530, in do_commit
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server dbapi_connection.commit()
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 767, in commit
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server self._read_ok_packet()
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 746, in _read_ok_packet
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server pkt = self._read_packet()
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 981, in _read_packet
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server packet.check_error()
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server err.raise_mysql_exception(self._data)
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server raise errorclass(errno, errval)
2018-08-09 13:02:12,464.464 35 ERROR oslo_messaging.rpc.server DBDeadlock: (pymysql.err.InternalError) (1213, u'Deadlock found when trying to get lock; try restarting transaction')

this is the conf file:

[DEFAULT]
volume_clear = none
verbose = False
quota_volumes = 100000
notification_driver = noop
osapi_max_limit = 1000000
quota_gigabytes = 1000000
debug = False
state_path = /var/lib/cinder
log_config_append = /etc/cinder/logging.conf
use_syslog = False
scheduler_default_filters = CapabilitiesFilter
rpc_backend = rabbit
osapi_volume_listen = 0.0.0.0
api_paste_config = /etc/cinder/api-paste.ini
auth_strategy = noauth
osapi_volume_workers = 2
host = symp
service_down_time = 120

[database]
idle_timeout = 3600
max_pool_size = 25
max_overflow = -1
connection = mysql+pymysql://cinder:<email address hidden>/cinder

[oslo_concurrency]
lock_path = /var/lib/cinder/tmp

[oslo_messaging_rabbit]
rabbit_host = rabbitmq-server.service
rabbit_port = 5672
--------------------------------------

I saw this issue:
https://review.openstack.org/#/c/111410/

maybe this decorator should be added also here?:
https://github.com/openstack/cinder/blob/master/cinder/db/sqlalchemy/api.py#L6931

What do you think?

If any more details are required please ask.

Thanks.

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Hey Yakir, can you check if you are running into this situation?

http://lists.openstack.org/pipermail/openstack-operators/2017-May/013500.html

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Sorry, I see you included that part of your config and you are using mysql+pymsql.

Revision history for this message
Yakir Levi (yakirlevi) wrote :
Download full text (8.2 KiB)

Update:
This issue keeps happening, now it throws also on worker_update:

2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server [req-c68e0480-1dc3-49e9-986a-bea026057e4f - 0 - - -] Exception during message handling
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 222, in dispatch
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 192, in _do_dispatch
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server File "<decorator-gen-237>", line 2, in create_volume
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/objects/cleanable.py", line 204, in wrapper
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server cleanable.set_worker()
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/objects/cleanable.py", line 164, in set_worker
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server orm_worker=worker)
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/db/api.py", line 1698, in worker_update
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server orm_worker=orm_worker, **values)
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/db/sqlalchemy/api.py", line 6704, in worker_update
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server result = query.update(values)
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 3237, in update
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server update_op.exec_()
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1131, in exec_
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server self._do_exec()
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1286, in _do_exec
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server mapper=self.mapper)
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1097, in execute
2018-08-27 15:43:23,324.324 35 ERROR oslo_messaging.rpc.se...

Read more...

Revision history for this message
Yakir Levi (yakirlevi) wrote :

any update?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/611856

Changed in cinder:
assignee: nobody → Lucian Petrut (petrutlucian94)
status: New → In Progress
Revision history for this message
Yakir Levi (yakirlevi) wrote :
Download full text (5.7 KiB)

Thanks for help!

I encounter this also at volume_create.
can it be added to this function as well?

018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api result = task.execute(**arguments)
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api File "/usr/lib/python2.7/site-packages/cinder/volume/flows/api/create_volume.py", line 547, in execute
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api volume.create()
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api File "/usr/lib/python2.7/site-packages/cinder/objects/volume.py", line 340, in create
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api db_volume = db.volume_create(self._context, updates)
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api File "/usr/lib/python2.7/site-packages/cinder/db/api.py", line 234, in volume_create
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api return IMPL.volume_create(context, values)
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api File "/usr/lib/python2.7/site-packages/cinder/db/sqlalchemy/api.py", line 251, in wrapper
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api return f(*args, **kwargs)
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api File "/usr/lib/python2.7/site-packages/cinder/db/sqlalchemy/api.py", line 196, in wrapper
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api return f(*args, **kwargs)
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api File "/usr/lib/python2.7/site-packages/cinder/db/sqlalchemy/api.py", line 1493, in volume_create
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api session.add(volume_ref)
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 553, in __exit__
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api self.rollback()
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api compat.reraise(exc_type, exc_value, exc_tb)
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 550, in __exit__
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api self.commit()
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 459, in commit
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api t[1].commit()
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1574, in commit
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api self._do_commit()
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1605, in _do_commit
2018-09-03 21:27:54,096.096 35 ERROR cinder.volume.api self.connection._commit_impl()
2018-09-03 21:27:54,...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/611856
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=e1fc723db123651ad66a339d559aed578126c74a
Submitter: Zuul
Branch: master

commit e1fc723db123651ad66a339d559aed578126c74a
Author: Lucian Petrut <email address hidden>
Date: Fri Oct 19 15:02:17 2018 +0300

    Retry on DBDeadLock affecting worker updates

    Some updates on the workers table are prone to race conditions,
    leading to DB deadlocks.

    For example, when creating a large amount of volumes simultaneously,
    some of them may hang in "Creating" state due to this issue.

    We'll perform some retries when db deadlocks occur.

    Change-Id: Iae211acd619070405f84a84ea58e88723a5e9f9b
    Closes-Bug: #1789106

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/626557

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/rocky)

Reviewed: https://review.openstack.org/626557
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=781e964d6c5be7cafa7bc0a295bdc0c37764d939
Submitter: Zuul
Branch: stable/rocky

commit 781e964d6c5be7cafa7bc0a295bdc0c37764d939
Author: Lucian Petrut <email address hidden>
Date: Fri Oct 19 15:02:17 2018 +0300

    Retry on DBDeadLock affecting worker updates

    Some updates on the workers table are prone to race conditions,
    leading to DB deadlocks.

    For example, when creating a large amount of volumes simultaneously,
    some of them may hang in "Creating" state due to this issue.

    We'll perform some retries when db deadlocks occur.

    Change-Id: Iae211acd619070405f84a84ea58e88723a5e9f9b
    Closes-Bug: #1789106
    (cherry picked from commit e1fc723db123651ad66a339d559aed578126c74a)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/629059

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/queens)

Reviewed: https://review.openstack.org/629059
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=69c89e384ad68d72b430d450e7048cd1e0b8f80b
Submitter: Zuul
Branch: stable/queens

commit 69c89e384ad68d72b430d450e7048cd1e0b8f80b
Author: Lucian Petrut <email address hidden>
Date: Fri Oct 19 15:02:17 2018 +0300

    Retry on DBDeadLock affecting worker updates

    Some updates on the workers table are prone to race conditions,
    leading to DB deadlocks.

    For example, when creating a large amount of volumes simultaneously,
    some of them may hang in "Creating" state due to this issue.

    We'll perform some retries when db deadlocks occur.

    Change-Id: Iae211acd619070405f84a84ea58e88723a5e9f9b
    Closes-Bug: #1789106
    (cherry picked from commit e1fc723db123651ad66a339d559aed578126c74a)
    (cherry picked from commit 781e964d6c5be7cafa7bc0a295bdc0c37764d939)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 12.0.5

This issue was fixed in the openstack/cinder 12.0.5 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 13.0.3

This issue was fixed in the openstack/cinder 13.0.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/644941

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 14.0.0.0rc1

This issue was fixed in the openstack/cinder 14.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/pike)

Reviewed: https://review.openstack.org/644941
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=713bb5f5067b0a0c843f75f82334d19f66dc1656
Submitter: Zuul
Branch: stable/pike

commit 713bb5f5067b0a0c843f75f82334d19f66dc1656
Author: Lucian Petrut <email address hidden>
Date: Fri Oct 19 15:02:17 2018 +0300

    Retry on DBDeadLock affecting worker updates

    Some updates on the workers table are prone to race conditions,
    leading to DB deadlocks.

    For example, when creating a large amount of volumes simultaneously,
    some of them may hang in "Creating" state due to this issue.

    We'll perform some retries when db deadlocks occur.

    Change-Id: Iae211acd619070405f84a84ea58e88723a5e9f9b
    Closes-Bug: #1789106
    (cherry picked from commit e1fc723db123651ad66a339d559aed578126c74a)
    (cherry picked from commit 781e964d6c5be7cafa7bc0a295bdc0c37764d939)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 11.2.2

This issue was fixed in the openstack/cinder 11.2.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.