quantum server consumes all of sqlalchemy sessions

Bug #1136064 reported by yong sheng gong
34
This bug affects 5 people
Affects Status Importance Assigned to Milestone
neutron
Expired
Undecided
Unassigned

Bug Description

2013-03-01 01:19:44.356 17363 DEBUG quantum.openstack.common.rpc.amqp [-] unpacked context: {'user_id': None, 'roles': [u'admin'], 'tenant_id': None, 'is_admin': True, 'timestamp': u'2013-02-28 17:19:35.268564', 'project_id': None, 'read_deleted': u'no'} _safe_log /mnt/data/git/quantum/quantum/openstack/common/rpc/common.py:254
2013-03-01 01:19:44.450 17363 DEBUG quantum.db.dhcp_rpc_base [-] Network list requested from HostC get_active_networks /mnt/data/git/quantum/quantum/db/dhcp_rpc_base.py:35
2013-03-01 01:19:44.454 17363 DEBUG quantum.scheduler.dhcp_agent_scheduler [-] No non-hosted networks auto_schedule_networks /mnt/data/git/quantum/quantum/scheduler/dhcp_agent_scheduler.py:101
2013-03-01 01:19:44.491 17363 DEBUG quantum.plugins.openvswitch.ovs_db_v2 [-] get_port_with_securitygroups() called:port_id=ce9af4bb-970c-4869-81fd-1dd70d754cba get_port_from_device /mnt/data/git/quantum/quantum/plugins/openvswitch/ovs_db_v2.py:309
2013-03-01 01:20:14.914 17363 ERROR quantum.openstack.common.rpc.amqp [-] Exception during message handling
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp Traceback (most recent call last):
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/mnt/data/git/quantum/quantum/openstack/common/rpc/amqp.py", line 429, in _process_data
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp rval = self.proxy.dispatch(ctxt, version, method, **args)
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/mnt/data/git/quantum/quantum/common/rpc.py", line 43, in dispatch
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp quantum_ctxt, version, method, **kwargs)
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/mnt/data/git/quantum/quantum/openstack/common/rpc/dispatcher.py", line 133, in dispatch
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs)
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/mnt/data/git/quantum/quantum/db/agents_db.py", line 167, in report_state
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp plugin.create_or_update_agent(context, agent_state)
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/mnt/data/git/quantum/quantum/db/agents_db.py", line 145, in create_or_update_agent
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp context, agent['agent_type'], agent['host'])
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/mnt/data/git/quantum/quantum/db/agents_db.py", line 121, in _get_agent_by_type_and_host
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp Agent.host == host).one()
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2184, in one
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp ret = list(self)
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2227, in __iter__
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp return self._execute_and_instances(context)
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2240, in _execute_and_instances
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp close_with_result=True)
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2231, in _connection_from_session
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp **kw)
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 777, in connection
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp close_with_result=close_with_result)
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 781, in _connection_for_bind
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp return self.transaction._connection_for_bind(engine)
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 306, in _connection_for_bind
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp conn = bind.contextual_connect()
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2489, in contextual_connect
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp self.pool.connect(),
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 236, in connect
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp return _ConnectionFairy(self).checkout()
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 401, in __init__
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp rec = self._connection_record = pool._do_get()
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 738, in _do_get
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp (self.size(), self.overflow(), self._timeout))
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30
2013-03-01 01:20:14.914 17363 TRACE quantum.openstack.common.rpc.amqp

Tags: neutron-core
Revision history for this message
yong sheng gong (gongysh) wrote :

reproduce step:
1.start quantum server and then stop to create the topic queue
2.start dhcp agent which will report_state
3. start l3 agent which will report-state
4. start ovs agent which will report-state

wait for a while
5. start quantum server

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

This also happened to me a few times today.
I had the impression the root cause was messages being left in the queue and waiting for quantum to come up and be delivered.

Quantum was then flooded with RPC messages, each of which executed one or more queries.

I relaxed the default period from 4 seconds to 60 as a stop-gap measure.

Changed in quantum:
status: New → Confirmed
Revision history for this message
yong sheng gong (gongysh) wrote :
Download full text (10.5 KiB)

with the stpes:
start quantum server,
start my testing program at dhcp_agent.py:
   def main():
    eventlet.monkey_patch()
    cfg.CONF.register_opts(DhcpAgent.OPTS)
    config.register_agent_state_opts_helper(cfg.CONF)
    config.register_root_helper(cfg.CONF)
    cfg.CONF.register_opts(DeviceManager.OPTS)
    cfg.CONF.register_opts(DhcpLeaseRelay.OPTS)
    cfg.CONF.register_opts(dhcp.OPTS)
    cfg.CONF.register_opts(interface.OPTS)
    cfg.CONF(project='quantum')
    plugin_api = DhcpPluginApi(topics.PLUGIN,
                               context.get_admin_context_without_session())
    import time
    while True:
        network_id = '6e636155-41da-4518-8acd-e99d4c7171c4'
        ip_address = '10.0.1.3'
        lease_remaining = 20
        plugin_api.update_lease_expiration(network_id,
                                           ip_address,
                                           lease_remaining)
        time.sleep(0.5)

stop the quantum server for a while
start the quantum server, then the bug is here:

2013-03-05 16:17:44.007 23712 DEBUG quantum.openstack.common.rpc.amqp [-] received {u'_context_roles': [u'admin'], u'_context_read_deleted': u'no', u'_context_tenant_id': None, u'args': {u'network_id': u'6e636155-41da-4518-8acd-e99d4c7171c4', u'lease_remaining': 20, u'host': u'HostC', u'ip_address': u'10.0.1.3'}, u'_unique_id': u'41ce04b51899448bad166dbf5f89f6ab', u'_context_is_admin': True, u'version': u'1.0', u'_context_project_id': None, u'_context_timestamp': u'2013-03-05 08:16:43.107613', u'_context_user_id': None, u'method': u'update_lease_expiration'} _safe_log /mnt/data/git/quantum/quantum/openstack/common/rpc/common.py:254
2013-03-05 16:17:44.007 23712 DEBUG quantum.openstack.common.rpc.amqp [-] unpacked context: {'user_id': None, 'roles': [u'admin'], 'tenant_id': None, 'is_admin': True, 'timestamp': u'2013-03-05 08:16:43.107613', 'project_id': None, 'read_deleted': u'no'} _safe_log /mnt/data/git/quantum/quantum/openstack/common/rpc/common.py:254
2013-03-05 16:17:44.007 23712 DEBUG quantum.db.dhcp_rpc_base [-] Updating lease expiration for 10.0.1.3 on network 6e636155-41da-4518-8acd-e99d4c7171c4 from HostC. update_lease_expiration /mnt/data/git/quantum/quantum/db/dhcp_rpc_base.py:181
2013-03-05 16:17:44.067 23712 DEBUG quantum.db.dhcp_rpc_base [-] Updating lease expiration for 10.0.1.3 on network 6e636155-41da-4518-8acd-e99d4c7171c4 from HostC. update_lease_expiration /mnt/data/git/quantum/quantum/db/dhcp_rpc_base.py:181
2013-03-05 16:17:44.109 23712 DEBUG quantum.db.dhcp_rpc_base [-] Updating lease expiration for 10.0.1.3 on network 6e636155-41da-4518-8acd-e99d4c7171c4 from HostC. update_lease_expiration /mnt/data/git/quantum/quantum/db/dhcp_rpc_base.py:181
2013-03-05 16:17:44.150 23712 DEBUG quantum.db.dhcp_rpc_base [-] Updating lease expiration for 10.0.1.3 on network 6e636155-41da-4518-8acd-e99d4c7171c4 from HostC. update_lease_expiration /mnt/data/git/quantum/quantum/db/dhcp_rpc_base.py:181
2013-03-05 16:17:44.192 23712 DEBUG quantum.db.dhcp_rpc_base [-] Updating lease expiration for 10.0.1.3 on network 6e636155-41da-4518-8acd-e99d4c7171c4 from HostC. update_lease_expiration /mnt/data/git/quantum/quantum/db/dhcp_rpc_base.p...

Revision history for this message
yong sheng gong (gongysh) wrote :

if I change time.sleep(0.5) into time.sleep(0.1),
after running a while the quantum server will raise this exception too.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

With comment #4, do you mean that with sleep intervals of 0.1 seconds, we exhaust the connection pool even without stopping the quantum server?

Have you tried adding some connection pool logging to see why sessions are not being disposed of as expected?

Revision history for this message
yong sheng gong (gongysh) wrote :

My fault, I modified the quantum server bin:
eventlet.monkey_patch(os=False, thread=False).

without this modification, I cannot reproduce it.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

I was observing quantum server being flooded with messages from the dhcp agent.
This is however not happening anymore because:

        return self.call(context,
                         self.make_msg('report_state',
                                       agent_state={'agent_state':
                                                    agent_state}),
                         topic=self.topic)

has replaced:

        return self.cast(context,
                         self.make_msg('report_state',
                                       agent_state={'agent_state':
                                                    agent_state}),
                         topic=self.topic)

in a recent commit from Yong.
From my perspective, this bug can be marked as invalid.

Revision history for this message
dan wendlandt (danwent) wrote :

Ok, i'm moving this to incomplete, leaving it up to yong to make the final call on whether we can move this to invalid.

Changed in quantum:
status: Confirmed → Incomplete
Revision history for this message
dan wendlandt (danwent) wrote :

Yong, can you confirm that we can remove this from RC1 and mark it as invalid?

Revision history for this message
yong sheng gong (gongysh) wrote :

salvatore:
the cast just slows down one of the agent. If we have hundreds of agents, it will be disastrous. Since hundreds of agents are not practical, I was trying to simulate with plugin_api.update_lease_expiration(network_id,
                                           ip_address,
                                           lease_remaining)

which is cast and will be sent to server for each VM every 30 seconds (default configuration), and can be used to send many msgs as I wish.

according to my testing, we can move it out of RC1, but I will go on keeping an eye on it.

Changed in quantum:
status: Incomplete → Invalid
status: Invalid → Incomplete
milestone: grizzly-rc1 → none
tags: added: quantum-core
Changed in quantum:
importance: High → Undecided
tags: added: neutron-core
removed: quantum-core
Revision history for this message
Hua Zhang (zhhuabj) wrote :
Download full text (16.2 KiB)

I also met this issue using latest the code today.

2013-07-08 22:14:28.904 24770 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2489, in contextual_connect
2013-07-08 22:14:28.904 24770 TRACE neutron.openstack.common.rpc.amqp self.pool.connect(),
2013-07-08 22:14:28.904 24770 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 236, in connect
2013-07-08 22:14:28.904 24770 TRACE neutron.openstack.common.rpc.amqp return _ConnectionFairy(self).checkout()
2013-07-08 22:14:28.904 24770 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 401, in __init__
2013-07-08 22:14:28.904 24770 TRACE neutron.openstack.common.rpc.amqp rec = self._connection_record = pool._do_get()
2013-07-08 22:14:28.904 24770 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 738, in _do_get
2013-07-08 22:14:28.904 24770 TRACE neutron.openstack.common.rpc.amqp (self.size(), self.overflow(), self._timeout))
2013-07-08 22:14:28.904 24770 TRACE neutron.openstack.common.rpc.amqp TimeoutError: QueuePool limit of size 10 overflow 20 reached, connection timed out, timeout 10
2013-07-08 22:14:28.904 24770 TRACE neutron.openstack.common.rpc.amqp
2013-07-08 22:14:38.910 24770 ERROR neutron.openstack.common.rpc.amqp [-] Exception during message handling
2013-07-08 22:14:38.910 24770 TRACE neutron.openstack.common.rpc.amqp Traceback (most recent call last):
2013-07-08 22:14:38.910 24770 TRACE neutron.openstack.common.rpc.amqp File "/bak/openstack/neutron/neutron/openstack/common/rpc/amqp.py", line 424, in _process_data
2013-07-08 22:14:38.910 24770 TRACE neutron.openstack.common.rpc.amqp **args)
2013-07-08 22:14:38.910 24770 TRACE neutron.openstack.common.rpc.amqp File "/bak/openstack/neutron/neutron/common/rpc.py", line 44, in dispatch
2013-07-08 22:14:38.910 24770 TRACE neutron.openstack.common.rpc.amqp neutron_ctxt, version, method, namespace, **kwargs)
2013-07-08 22:14:38.910 24770 TRACE neutron.openstack.common.rpc.amqp File "/bak/openstack/neutron/neutron/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2013-07-08 22:14:38.910 24770 TRACE neutron.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2013-07-08 22:14:38.910 24770 TRACE neutron.openstack.common.rpc.amqp File "/bak/openstack/neutron/neutron/db/agents_db.py", line 174, in report_state
2013-07-08 22:14:38.910 24770 TRACE neutron.openstack.common.rpc.amqp plugin.create_or_update_agent(context, agent_state)
2013-07-08 22:14:38.910 24770 TRACE neutron.openstack.common.rpc.amqp File "/bak/openstack/neutron/neutron/db/agents_db.py", line 145, in create_or_update_agent
2013-07-08 22:14:38.910 24770 TRACE neutron.openstack.common.rpc.amqp context, agent['agent_type'], agent['host'])
2013-07-08 22:14:38.910 24770 TRACE neutron.openstack.common.rpc.amqp File "/bak/openstack/neutron/neutron/db/agents_db.py", line 121, in _get_agent_by_type_and_host
2013-07-08 22:14:38.910 24770 TRACE neutron.openstack.common.rpc...

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

The last comment seems actually Bug 1184484

I think we should mark this as invalid/duplicated.

Changed in neutron:
assignee: yong sheng gong (gongysh) → nobody
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.