volumes stuck deleting after sqlalchemy queuepool timeout

Bug #1224608 reported by Edward Hope-Morley
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Cinder
Invalid
Medium
Mike Mason

Bug Description

Deleting a large number of volumes (100) on a slow machine (devstack running in a vm on a laptop) I ended up getting the following error which left my volumes in the deleting state and I was subsequently unable to delete them:

2013-09-12 18:36:01.550 ERROR cinder.openstack.common.rpc.amqp [req-3ae246ae-a440-412c-9e68-bab4843b87ed 767d193549524160ae388a4daf17ec87 9a2f629bf57d429ea7960d17221c58ae] Exception during message handling
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp Traceback (most recent call last):
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/openstack/common/rpc/amqp.py", line 441, in _process_data
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp **args)
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/openstack/common/rpc/dispatcher.py", line 148, in dispatch
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs)
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/volume/manager.py", line 233, in delete_volume
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp volume_ref = self.db.volume_get(context, volume_id)
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/db/api.py", line 213, in volume_get
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp return IMPL.volume_get(context, volume_id)
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/db/sqlalchemy/api.py", line 138, in wrapper
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp return f(*args, **kwargs)
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/db/sqlalchemy/api.py", line 1196, in volume_get
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp return _volume_get(context, volume_id)
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/db/sqlalchemy/api.py", line 138, in wrapper
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp return f(*args, **kwargs)
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/db/sqlalchemy/api.py", line 1185, in _volume_get
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp filter_by(id=volume_id).\
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2156, in first
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp ret = list(self[0:1])
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2023, in __getitem__
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp return list(res)
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2227, in __iter__
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp return self._execute_and_instances(context)
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2240, in _execute_and_instances
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp close_with_result=True)
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2231, in _connection_from_session
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp **kw)
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 777, in connection
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp close_with_result=close_with_result)
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 783, in _connection_for_bind
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp return engine.contextual_connect(**kwargs)
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2489, in contextual_connect
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp self.pool.connect(),
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 236, in connect
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp return _ConnectionFairy(self).checkout()
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 401, in __init__
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp rec = self._connection_record = pool._do_get()
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 738, in _do_get
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp (self.size(), self.overflow(), self._timeout))
2013-09-12 18:36:01.550 TRACE cinder.openstack.common.rpc.amqp TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30

description: updated
description: updated
Revision history for this message
Ganpat Agarwal (gans-developer) wrote :

I tried to execute a batch process of creating 100 volumes , but got the same error.

Is there any way we can change the limit to overcome this problem?

Revision history for this message
Ganpat Agarwal (gans-developer) wrote :

I got the same error while trying to delete 50+ volume together.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

One workaround may be to reduce the size of the rpc thread pool using CONF.rpc_thread_pool_size

Revision history for this message
Xing Yang (xing-yang) wrote :

This problem was caused by too many requests trying to access db at the same time. We ran into this problem when trying to delete 100 volumes at the same time.

The TimeoutError "QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30" means:
pool size: 5
overflow: 10
timeout: 30

The current connections reached 15 (5+10). The new request will wait 30 seconds (timeout) to acquire connection from the pool or raise the TimeoutError.

This problem was seen in projects other than Cinder:

https://bugs.launchpad.net/nova/+bug/1181372
https://bugs.launchpad.net/ceilometer/+bug/1243251

Here is a solution for this problem.

Add the following options in /etc/cinder.conf:

# Changing the following options max_pool_size and max_overflow
# to allow more db connections
[database]
# Maximum number of SQL connections to keep open in a pool
# (integer value)
max_pool_size=20

# If set, use this value for max_overflow with sqlalchemy
# (integer value)
max_overflow=30

Note: Need to add [database] before max_pool_size and max_overflow.
If you have multi-backend options in cinder.conf, add these options after the multi-backend options, otherwise, cinder-volume will be confused and can't read the multi-backend options.
The following config worked for us:
max_pool_size=25, max_overflow=-1 (-1 means infinite)
max_pool_size=20, max_overflow=30

After making cinder.conf changes, do the following:

cinder-manage db sync
Restart cinder-api, cinder-scheduler, and cinder-volume.

Revision history for this message
Mike Perez (thingee) wrote :

Similar approach needed as Ceilometer? https://review.openstack.org/#/c/63424/

Changed in cinder:
status: New → Confirmed
importance: Undecided → Medium
milestone: none → next
Changed in cinder:
assignee: nobody → Mike Mason (mikemason010)
Revision history for this message
Sean McGinnis (sean-mcginnis) wrote : Bug Cleanup

Closing stale bug. If this is still an issue please reopen.

Changed in cinder:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.