[stable/newton] Deleting heat stack failed due to error "QueuePool limit of size 50 overflow 50 reached, connection timed out, timeout 30"

Bug #1640029 reported by Sujai
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Won't Fix
Undecided
Unassigned
neutron
Won't Fix
Undecided
Unassigned

Bug Description

In my stable/newton setup running on a VMware NSX platform, I brought up 5 heat stacks each having 100 nova instances in the same /16 network.
Deleting those heat stacks failed due to the below error.

"
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions TimeoutError: QueuePool limit of size 50 overflow 50 reached, connection timed out, timeout 30
"

Because of this error, out of 500 instances, deletion of about 67 instances got failed.
With default parameters in neutron.conf, I'm getting the below neutron error.

2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource [req-a0022887-cc01-4f2e-980d-490136524363 admin -] delete failed: No details.
2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource Traceback (most recent call last):
.
.
.
2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2039, in contextual_connect
2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource self._wrap_pool_connect(self.pool.connect, None),
2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2074, in _wrap_pool_connect
2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource return fn()
2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 376, in connect
2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource return _ConnectionFairy._checkout(self)
2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 713, in _checkout
2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource fairy = _ConnectionRecord.checkout(pool)
2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 480, in checkout
2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource rec = pool._do_get()
2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 1053, in _do_get
2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource (self.size(), self.overflow(), self._timeout))
2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource TimeoutError: QueuePool limit of size 10 overflow 20 reached, connection timed out, timeout 10
2016-11-02 20:42:06.557 18058 ERROR neutron.api.v2.resource

After changing the below parameters in /etc/neutron/neutron.conf and

max_pool_size = 50
retry_interval = 10
max_overflow = 50
pool_max_size = 50
pool_max_overflow = 50
pool_timeout = 30

below parameters in nova.conf and restarted the services and re-executed the testcase.Still deleting heat stack is failing with the below error

max_pool_size = 50
max_overflow = 50

n-api.log:

2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions [req-db3d6d66-9508-4eb8-be65-964f05ff50f8 admin admin] Unexpected exception in API method
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions Traceback (most recent call last):
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/api/openstack/extensions.py", line 338, in wrapped
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions return f(*args, **kwargs)
.
.
.
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 480, in checkout
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions rec = pool._do_get()
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 1053, in _do_get
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions (self.size(), self.overflow(), self._timeout))
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions TimeoutError: QueuePool limit of size 50 overflow 50 reached, connection timed out, timeout 30
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions
2016-11-03 17:27:34.148 2399 INFO nova.api.openstack.wsgi [req-db3d6d66-9508-4eb8-be65-964f05ff50f8 admin admin] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'sqlalchemy.exc.TimeoutError'>
2016-11-03 17:27:34.148 2399 DEBUG nova.api.openstack.wsgi [req-db3d6d66-9508-4eb8-be65-964f05ff50f8 admin admin] Returning 500 to user: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'sqlalchemy.exc.TimeoutError'> __call__ /opt/stack/nova/nova/api/openstack/wsgi.py:1044

Please look into this.

Sujai (sujai)
summary: - [stable/newton] Deleting heat stack failed due to neutron error
- "QueuePool limit of size 50 overflow 50 reached, connection timed out,
- timeout 30"
+ [stable/newton] Deleting heat stack failed due to error "QueuePool limit
+ of size 50 overflow 50 reached, connection timed out, timeout 30"
Revision history for this message
Andreas Scheuring (andreas-scheuring) wrote :

Hi Sujai, thanks for reporting this. The issue is not yet clear to me.
Can you please provide:
- max_pool_size, retry_interval, max_overflow settings of your first run
- the nova error message of the first and the second run
- the neutron error message of the first and second run
At the moment you provided a neutron message for the first run and a nova message for the second. So I can't see what changed after updating the settings. In general it looks like that for deleting 500 instances in parallel a lot of sql queries are required. It seems like for such a use case you need to tune your parameters..

But let's compare your error message of the first and second run first. If they are equal, I tend to set this to invalid

Changed in nova:
status: New → Incomplete
Changed in neutron:
status: New → Incomplete
Revision history for this message
Sujai (sujai) wrote :
Download full text (17.5 KiB)

Hi,

Please find the information below.

For the first run:
------------------

In /etc/neutron/neutron.conf:

The parameters are commented as below.
#max_pool_size = 5
#retry_interval = 10
#max_overflow = 50
#pool_max_size = 30
#pool_max_overflow = 0
#pool_timeout = 30

And in nova.conf, I didn't set max_pool_size and max_overflow.

Neutron Logs:

2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource [req-65a024cc-dad4-4484-98f4-349be67cb7aa admin -] index failed: No details.
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource Traceback (most recent call last):
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource File "/opt/stack/neutron/neutron/api/v2/resource.py", line 79, in resource
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource result = method(request=request, **args)
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource File "/opt/stack/neutron/neutron/db/api.py", line 88, in wrapped
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource setattr(e, '_RETRY_EXCEEDED', True)
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource self.force_reraise()
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource six.reraise(self.type_, self.value, self.tb)
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource File "/opt/stack/neutron/neutron/db/api.py", line 84, in wrapped
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource return f(*args, **kwargs)
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource File "/usr/local/lib/python2.7/dist-packages/oslo_db/api.py", line 151, in wrapper
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource ectxt.value = e.inner_exc
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource self.force_reraise()
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource six.reraise(self.type_, self.value, self.tb)
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource File "/usr/local/lib/python2.7/dist-packages/oslo_db/api.py", line 139, in wrapper
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource return f(*args, **kwargs)
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource File "/opt/stack/neutron/neutron/db/api.py", line 124, in wrapped
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource traceback.format_exc())
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-11-02 20:42:00.247 18058 ERROR neutron.api.v2.resource self.force_re...

Revision history for this message
Sujai (sujai) wrote :
Download full text (9.0 KiB)

For the second run:
-------------------

I have set the below parameters in /etc/neutron/neutron.conf.

max_pool_size = 50
retry_interval = 10
max_overflow = 50
pool_max_size = 50
pool_max_overflow = 50
pool_timeout = 30

And also I have set the below parameters in nova.conf.

max_pool_size = 50
max_overflow = 50

For the second run, I didn't get any ERROR messages in the neutron log. Only traces are found in nova log.

Nova Logs:

2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions [req-db3d6d66-9508-4eb8-be65-964f05ff50f8 admin admin] Unexpected exception in API method
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions Traceback (most recent call last):
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/api/openstack/extensions.py", line 338, in wrapped
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions return f(*args, **kwargs)
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/api/openstack/compute/servers.py", line 914, in delete
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions self._delete(req.environ['nova.context'], req, id)
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/api/openstack/compute/servers.py", line 755, in _delete
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions instance = self._get_server(context, req, instance_uuid)
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/api/openstack/compute/servers.py", line 382, in _get_server
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions expected_attrs=expected_attrs)
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/api/openstack/common.py", line 496, in get_instance
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions expected_attrs=expected_attrs)
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/compute/api.py", line 2215, in get
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions expected_attrs)
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/compute/api.py", line 2198, in _get_instance
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions context, instance_uuid, expected_attrs=expected_attrs)
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions result = fn(cls, context, *args, **kwargs)
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/objects/instance.py", line 444, in get_by_uuid
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions use_slave=use_slave)
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/db/sqlalchemy/api.py", line 225, in wrapper
2016-11-03 17:27:34.146 2399 ERROR nova.api.openstack.extensions return f(*args, **kwargs)
2016-...

Read more...

Sujai (sujai)
description: updated
Changed in nova:
status: Incomplete → New
Changed in neutron:
status: Incomplete → New
Revision history for this message
jichenjc (jichenjc) wrote :

not an expert in sqlalchema, but seems it's related to sqlalchema, instead of nova/neutron
have you tried to increase the value to even bigger? the error reported in sqlalchema and
nova side really can't handle this 500 error, as it IS internal error because of connection timeout

Revision history for this message
Sean Dague (sdague) wrote :

I kind of feel this is in the class of bugs of "I hit everything with a sledge hammer really hard without tuning, and it fell over".

Falling over is what things do with out of the box defaults that are pushed way beyond their limits.

Changed in nova:
status: New → Won't Fix
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Bug closed due to lack of activity, please feel free to reopen if needed.

Changed in neutron:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.