Exception during message handling: QueuePool limit of size 10 overflow 20 reached, connection timed out, timeout 10

Bug #1384108 reported by James Page
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Expired
Undecided
Unassigned
neutron (Ubuntu)
Won't Fix
Low
Unassigned

Bug Description

OpenStack Juno release, Ubuntu 14.04 using Cloud Archive; under relatively high instance creation concurrency (150), neutron starts to throw some errors:

2014-10-21 16:40:44.124 16312 ERROR oslo.messaging._drivers.common [req-8e3ebbdb-bc01-439d-af86-655176f206a6 ] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply\n incoming.message))\n', ' File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch\n return self._do_dispatch(endpoint, method, ctxt, args)\n', ' File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch\n result = getattr(endpoint, method)(ctxt, **new_args)\n', ' File "/usr/lib/python2.7/dist-packages/neutron/api/rpc/handlers/securitygroups_rpc.py", line 74, in security_group_info_for_devices\n ports = self._get_devices_info(devices_info)\n', ' File "/usr/lib/python2.7/dist-packages/neutron/api/rpc/handlers/securitygroups_rpc.py", line 41, in _get_devices_info\n port = self.plugin.get_port_from_device(device)\n', ' File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 1161, in get_port_from_device\n port = db.get_port_and_sgs(port_id)\n', ' File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/db.py", line 222, in get_port_and_sgs\n port_and_sgs = query.all()\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2300, in all\n return list(self)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2412, in __iter__\n return self._execute_and_instances(context)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2425, in _execute_and_instances\n close_with_result=True)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2416, in _connection_from_session\n **kw)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 854, in connection\n close_with_result=close_with_result)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 858, in _connection_for_bind\n return self.transaction._connection_for_bind(engine)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 322, in _connection_for_bind\n conn = bind.contextual_connect()\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1799, in contextual_connect\n self.pool.connect(),\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 338, in connect\n return _ConnectionFairy._checkout(self)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 641, in _checkout\n fairy = _ConnectionRecord.checkout(pool)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 440, in checkout\n rec = pool._do_get()\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 957, in _do_get\n (self.size(), self.overflow(), self._timeout))\n', 'TimeoutError: QueuePool limit of size 10 overflow 20 reached, connection timed out, timeout 10\n']
2014-10-21 16:40:44.126 16312 ERROR oslo.messaging.rpc.dispatcher [req-ea96dc85-dc0f-4ddc-a827-dbc25ab32a03 ] Exception during message handling: QueuePool limit of size 10 overflow 20 reached, connection timed out, timeout 10
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher incoming.message))
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args)
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/neutron/db/agents_db.py", line 237, in report_state
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher self.plugin.create_or_update_agent(context, agent_state)
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/neutron/db/agents_db.py", line 197, in create_or_update_agent
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher return self._create_or_update_agent(context, agent)
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/neutron/db/agents_db.py", line 176, in _create_or_update_agent
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher context, agent['agent_type'], agent['host'])
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/neutron/db/agents_db.py", line 153, in _get_agent_by_type_and_host
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher Agent.host == host).one()
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2369, in one
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher ret = list(self)
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2412, in __iter__
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher return self._execute_and_instances(context)
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2425, in _execute_and_instances
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher close_with_result=True)
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2416, in _connection_from_session
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher **kw)
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 854, in connection
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher close_with_result=close_with_result)
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 858, in _connection_for_bind
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher return self.transaction._connection_for_bind(engine)
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 322, in _connection_for_bind
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher conn = bind.contextual_connect()
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1799, in contextual_connect
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher self.pool.connect(),
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 338, in connect
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher return _ConnectionFairy._checkout(self)
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 641, in _checkout
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher fairy = _ConnectionRecord.checkout(pool)
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 440, in checkout
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher rec = pool._do_get()
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 957, in _do_get
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher (self.size(), self.overflow(), self._timeout))
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher TimeoutError: QueuePool limit of size 10 overflow 20 reached, connection timed out, timeout 10
2014-10-21 16:40:44.126 16312 TRACE oslo.messaging.rpc.dispatcher

The cloud has three neutron-server instances, each configured with 32 API workers.

Revision history for this message
James Page (james-page) wrote :

Environment was being exercises using the boot-and-delete task/rally

tags: added: juno scale-testing
Revision history for this message
Kevin Benton (kevinbenton) wrote : Re: [Bug 1384108] [NEW] Exception during message handling: QueuePool limit of size 10 overflow 20 reached, connection timed out, timeout 10
Download full text (19.1 KiB)

Can you try setting the max connections for SQL to the number of API
workers?
http://docs.openstack.org/trunk/config-reference/content/networking-options-db.html
On Oct 22, 2014 4:31 AM, "James Page" <email address hidden> wrote:

> Public bug reported:
>
> OpenStack Juno release, Ubuntu 14.04 using Cloud Archive; under
> relatively high instance creation concurrency (150), neutron starts to
> throw some errors:
>
> 2014-10-21 16:40:44.124 16312 ERROR oslo.messaging._drivers.common
> [req-8e3ebbdb-bc01-439d-af86-655176f206a6 ] ['Traceback (most recent call
> last):\n', ' File
> "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line
> 134, in _dispatch_and_reply\n incoming.message))\n', ' File
> "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line
> 177, in _dispatch\n return self._do_dispatch(endpoint, method, ctxt,
> args)\n', ' File
> "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line
> 123, in _do_dispatch\n result = getattr(endpoint, method)(ctxt,
> **new_args)\n', ' File
> "/usr/lib/python2.7/dist-packages/neutron/api/rpc/handlers/securitygroups_rpc.py",
> line 74, in security_group_info_for_devices\n ports =
> self._get_devices_info(devices_info)\n', ' File
> "/usr/lib/python2.7/dist-packages/neutron/api/rpc/handlers/securitygroups_rpc.py",
> line 41, in _get_devices_info\n port =
> self.plugin.get_port_from_device(device)\n', ' File
> "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line
> 1161, in get_port_from_device\n port = db.get_port_and_sgs(port_id)\n',
> ' File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/db.py", line
> 222, in get_port_and_sgs\n port_and_sgs = query.all()\n', ' File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2300, in
> all\n return list(self)\n', ' File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2412, in
> __iter__\n return self._execute_and_instances(context)\n', ' File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2425, in
> _execute_and_instances\n close_with_result=True)\n', ' File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2416, in
> _connection_from_session\n **kw)\n', ' File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 854, in
> connection\n close_with_result=close_with_result)\n', ' File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 858, in
> _connection_for_bind\n return
> self.transaction._connection_for_bind(engine)\n', ' File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 322, in
> _connection_for_bind\n conn = bind.contextual_connect()\n', ' File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1799, in
> contextual_connect\n self.pool.connect(),\n', ' File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 338, in
> connect\n return _ConnectionFairy._checkout(self)\n', ' File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 641, in
> _checkout\n fairy = _ConnectionRecord.checkout(pool)\n', ' File
> "/usr/lib/python2.7/dist-packages/sqlalc...

Revision history for this message
James Page (james-page) wrote :

I've bumped the max connections to 4 x the worker configuration just to be on the safe side.

Retesting now to see if this helps.

Revision history for this message
James Page (james-page) wrote :

Increasing the max pool size seems to resolve this particular issue; maybe having a default that's a multiple of the worker configuration would be more intelligent here?

Changed in neutron:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Eugene Nikanorov (enikanorov)
Revision history for this message
Zengfa Gao (zfgao) wrote :
Download full text (10.0 KiB)

We saw the same error too:

2014-12-08 11:08:33.315 47268 INFO neutron.wsgi [-] (47268) accepted ('10.23.70.28', 48899)
2014-12-08 11:08:33.315 47268 INFO neutron.wsgi [-] 10.23.70.28 - - [08/Dec/2014 11:08:33] "GET / HTTP/1.0" 200 262 0.000255
2014-12-08 11:08:33.985 47264 ERROR neutron.api.v2.resource [req-98012a5a-2a0d-4052-a30f-59d0e99c5e39 None] index failed
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource Traceback (most recent call last):
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/api/v2/resource.py", line 87, in resource
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource result = method(request=request, **args)
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/api/v2/base.py", line 308, in index
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource return self._items(request, True, parent_id)
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/api/v2/base.py", line 242, in _items
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource obj_list = obj_getter(request.context, **kwargs)
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/db/db_base_plugin_v2.py", line 1419, in get_ports
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource items = [self._make_port_dict(c, fields) for c in query]
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2438, in __iter__
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource return self._execute_and_instances(context)
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2451, in _execute_and_instances
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource close_with_result=True)
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2442, in _connection_from_session
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource **kw)
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 854, in connection
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource close_with_result=close_with_result)
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 860, in _connection_for_bind
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource return engine.contextual_connect(**kwargs)
2014-12-08 11:08:33.985 47264 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1798, ...

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in neutron (Ubuntu):
status: New → Confirmed
Revision history for this message
rick jones (rick-jones2) wrote :

If this is happening on say a neutron API server, shouldn't an lsof against that pid show a pool-sized number of connections/file descriptors? Taking that a bit further, if lsof does not show a pool-sized number of connections, doesn't that suggest the pool code is getting a bit confused as to its inventory?

James Page (james-page)
Changed in neutron (Ubuntu):
importance: Undecided → Medium
Kyle Mestery (mestery)
Changed in neutron:
importance: High → Medium
James Page (james-page)
Changed in neutron (Ubuntu):
importance: Medium → Low
status: Confirmed → Incomplete
Revision history for this message
James Page (james-page) wrote :

Marking won't fix - I've not see this bug for a long time and this bug has been quiet for over 12 months.

Changed in neutron (Ubuntu):
status: Incomplete → Won't Fix
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (INCUBATOR-JUNO, LIBERTY, MITAKA, NEWTON).
  Valid example: CONFIRMED FOR: INCUBATOR-JUNO

Changed in neutron:
assignee: Eugene Nikanorov (enikanorov) → nobody
importance: Medium → Undecided
status: Confirmed → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.