CentOS+HA Neutron/Quantum server crashes after running for a few hours

Bug #1264012 reported by Pavel Vaylov
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
New
Medium
Fuel Library (Deprecated)

Bug Description

Environment: Fuel 3.2.1 CentOS+HA+ 2 compute nodes (Neutron GRE).
After running about 15-24 hours and provisioning/deleting instances Neutron/Quantum server on primary controller crashed with error:

DBError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30
2013-12-24 18:01:27 ERROR [quantum.db.api] DB exception wrapped.
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/quantum/db/api.py", line 260, in _wrap_db_error
    return f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2115, in all
    return list(self)
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2227, in __iter__
    return self._execute_and_instances(context)
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2240, in _execute_and_instances
    close_with_result=True)
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2231, in _connection_from_session
    **kw)
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 777, in connection
    close_with_result=close_with_result)
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 781, in _connection_for_bind
    return self.transaction._connection_for_bind(engine)
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 306, in _connection_for_bind
    conn = bind.contextual_connect()
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 2489, in contextual_connect
    self.pool.connect(),
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py", line 236, in connect
    return _ConnectionFairy(self).checkout()
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py", line 401, in __init__
    rec = self._connection_record = pool._do_get()
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py", line 738, in _do_get
    (self.size(), self.overflow(), self._timeout))
TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30
2013-12-24 18:01:27 ERROR [quantum.api.v2.resource] index failed
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/quantum/api/v2/resource.py", line 82, in resource
    result = method(request=request, **args)
  File "/usr/lib/python2.6/site-packages/quantum/api/v2/base.py", line 239, in index
    return self._items(request, True, parent_id)
  File "/usr/lib/python2.6/site-packages/quantum/api/v2/base.py", line 192, in _items
    obj_list = obj_getter(request.context, **kwargs)
  File "/usr/lib/python2.6/site-packages/quantum/plugins/openvswitch/ovs_quantum_plugin.py", line 597, in get_ports
    page_reverse)
  File "/usr/lib/python2.6/site-packages/quantum/db/db_base_plugin_v2.py", line 1433, in get_ports
    items = [self._make_port_dict(c, fields) for c in query.all()]
  File "/usr/lib/python2.6/site-packages/quantum/db/api.py", line 311, in _wrap_db_error
    raise DBError(e)
DBError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30

After I fixed it by restarting quantum server and ovs agents the same behavior appeared one more time.

Looks like we need to increase sql pool size: https://bugs.launchpad.net/tripleo/+bug/1184484

Tags: neutron
Revision history for this message
Pavel Vaylov (pvaylov) wrote :
Pavel Vaylov (pvaylov)
description: updated
Mike Scherbakov (mihgen)
Changed in fuel:
milestone: none → 4.0
Changed in fuel:
milestone: 4.0 → 4.1
Revision history for this message
Sergey Vasilenko (xenolog) wrote :

Pavel, can you give me access to env, there this bug reproduced now ?

Revision history for this message
Pavel Vaylov (pvaylov) wrote :

Sergey, it's impossible now. I've installed 4.0 version and it's running now. I can boot VMs with 3.2.1 tomorrow or I can boot 3.2.1 today evening but you have to restore all services. How much time do you want to work with environment ?

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
importance: Undecided → Medium
assignee: nobody → Fuel Library Team (fuel-library)
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

If the problem is confirmed to be related to the sql pool size, please mark it as a duplicate of https://bugs.launchpad.net/fuel/+bug/1274784

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.