DB error causes router rescheduling loop to fail

Bug #1546110 reported by Oleg Bondarev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Brian Haley

Bug Description

In router rescheduling looping task db call to get down bindings is done outside of try/except block which may cause task to fail (see traceback below). Need to put db operation inside try/except.

2016-02-15T10:44:44.259995+00:00 err: 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall [req-79bce4c3-2e81-446c-8b37-6d30e3a964e2 - - - - -] Fixed interval looping call 'neutron.services.l3_router.l3_router_plugin.L3RouterPlugin.reschedule_routers_from_down_agents' failed
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall Traceback (most recent call last):
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/oslo_service/loopingcall.py", line 113, in _run_loop
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/neutron/db/l3_agentschedulers_db.py", line 101, in reschedule_routers_from_down_agents
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall down_bindings = self._get_down_bindings(context, cutoff)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/neutron/db/l3_dvrscheduler_db.py", line 460, in _get_down_bindings
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall context, cutoff)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/neutron/db/l3_agentschedulers_db.py", line 149, in _get_down_bindings
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return query.all()
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2399, in all
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return list(self)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2516, in __iter__
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return self._execute_and_instances(context)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2529, in _execute_and_instances
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall close_with_result=True)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2520, in _connection_from_session
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall **kw)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 882, in connection
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall execution_options=execution_options)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 889, in _connection_for_bind
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall conn = engine.contextual_connect(**kw)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2039, in contextual_connect
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall self._wrap_pool_connect(self.pool.connect, None),
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2078, in _wrap_pool_connect
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall e, dialect, self)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1401, in _handle_dbapi_exception_noconnection
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall util.raise_from_cause(newraise, exc_info)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall reraise(type(exception), exception, tb=exc_tb)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2074, in _wrap_pool_connect
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return fn()
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 376, in connect
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return _ConnectionFairy._checkout(self)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 713, in _checkout
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall fairy = _ConnectionRecord.checkout(pool)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 485, in checkout
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall rec.checkin()
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall compat.reraise(exc_type, exc_value, exc_tb)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 482, in checkout
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall dbapi_connection = rec.get_connection()
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 594, in get_connection
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall self.connection = self.__connect()
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 607, in __connect
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall connection = self.__pool._invoke_creator(self)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/strategies.py", line 97, in connect
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return dialect.connect(*cargs, **cparams)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 385, in connect
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return self.dbapi.connect(*cargs, **cparams)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/MySQLdb/__init__.py", line 81, in Connect
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return Connection(*args, **kwargs)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 206, in __init__
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall super(Connection, self).__init__(*args, **kwargs2)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall DBConnectionError: (_mysql_exceptions.OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0")

tags: added: liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/280753

Changed in neutron:
status: New → In Progress
Changed in neutron:
milestone: none → mitaka-rc1
Changed in neutron:
milestone: mitaka-rc1 → newton-1
Changed in neutron:
assignee: Oleg Bondarev (obondarev) → Brian Haley (brian-haley)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/280753
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b6ec40cbf754de9d189f843cbddfca67d4103ee3
Submitter: Jenkins
Branch: master

commit b6ec40cbf754de9d189f843cbddfca67d4103ee3
Author: Oleg Bondarev <email address hidden>
Date: Tue Feb 16 18:03:52 2016 +0300

    Move db query to fetch down bindings under try/except

    In case of intermittent DB failures router and network auto-rescheduling
    tasks may fail due to error on fetching down bindings from db.
    Need to put this queries under try/except to prevent unexpected exit.

    Closes-Bug: #1546110
    Change-Id: Id48e899a5b3d906c6d1da4d03923bdda2681cd92

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/296533

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/296534

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/297067

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/mitaka)

Reviewed: https://review.openstack.org/296534
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=eb8ddb95bbb56bbdc658e15feebbf7f91d5ddf13
Submitter: Jenkins
Branch: stable/mitaka

commit eb8ddb95bbb56bbdc658e15feebbf7f91d5ddf13
Author: Oleg Bondarev <email address hidden>
Date: Tue Feb 16 18:03:52 2016 +0300

    Move db query to fetch down bindings under try/except

    In case of intermittent DB failures router and network auto-rescheduling
    tasks may fail due to error on fetching down bindings from db.
    Need to put this queries under try/except to prevent unexpected exit.

    Closes-Bug: #1546110
    Change-Id: Id48e899a5b3d906c6d1da4d03923bdda2681cd92
    (cherry picked from commit b6ec40cbf754de9d189f843cbddfca67d4103ee3)

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/296533
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=48a61967184a94d12a22319789b42b3ea6bebf5e
Submitter: Jenkins
Branch: stable/liberty

commit 48a61967184a94d12a22319789b42b3ea6bebf5e
Author: Oleg Bondarev <email address hidden>
Date: Tue Feb 16 18:03:52 2016 +0300

    Move db query to fetch down bindings under try/except

    In case of intermittent DB failures router and network auto-rescheduling
    tasks may fail due to error on fetching down bindings from db.
    Need to put this queries under try/except to prevent unexpected exit.

    Closes-Bug: #1546110
    Change-Id: Id48e899a5b3d906c6d1da4d03923bdda2681cd92
    (cherry picked from commit b6ec40cbf754de9d189f843cbddfca67d4103ee3)

tags: added: in-stable-liberty
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/kilo)

Reviewed: https://review.openstack.org/297067
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2b6c5f0eba65a247c35e07b2c6b614577daf3af2
Submitter: Jenkins
Branch: stable/kilo

commit 2b6c5f0eba65a247c35e07b2c6b614577daf3af2
Author: Oleg Bondarev <email address hidden>
Date: Tue Feb 16 18:03:52 2016 +0300

    Move db query to fetch down bindings under try/except

    In case of intermittent DB failures router and network auto-rescheduling
    tasks may fail due to error on fetching down bindings from db.
    Need to put this queries under try/except to prevent unexpected exit.

    Conflicts:
     neutron/db/agentschedulers_db.py
     neutron/db/l3_agentschedulers_db.py
     neutron/tests/unit/scheduler/test_dhcp_agent_scheduler.py

    Closes-Bug: #1546110
    Change-Id: Id48e899a5b3d906c6d1da4d03923bdda2681cd92
    (cherry picked from commit b6ec40cbf754de9d189f843cbddfca67d4103ee3)
    (cherry picked from commit 48a61967184a94d12a22319789b42b3ea6bebf5e)

tags: added: in-stable-kilo
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/314250

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 8.1.0

This issue was fixed in the openstack/neutron 8.1.0 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 2015.1.4

This issue was fixed in the openstack/neutron 2015.1.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)
Download full text (36.9 KiB)

Reviewed: https://review.openstack.org/314250
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3bf73801df169de40d365e6240e045266392ca63
Submitter: Jenkins
Branch: master

commit a323769143001d67fd1b3b4ba294e59accd09e0e
Author: Ryan Moats <email address hidden>
Date: Tue Oct 20 15:51:37 2015 +0000

    Revert "Improve performance of ensure_namespace"

    This reverts commit 81823e86328e62850a89aef9f0b609bfc0a6dacd.

    Unneeded optimization: this commit only improves execution
    time on the order of milliseconds, which is less than 1% of
    the total router update execution time at the network node.

    This also

    Closes-bug: #1574881

    Change-Id: Icbcdf4725ba7d2e743bb6761c9799ae436bd953b

commit 7fcf0253246832300f13b0aa4cea397215700572
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Apr 21 07:05:16 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I9e930750dde85a9beb0b6f85eeea8a0962d3e020

commit 643b4431606421b09d05eb0ccde130adbf88df64
Author: OpenStack Proposal Bot <email address hidden>
Date: Tue Apr 19 06:52:48 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I52d7460b3265b5460b9089e1cc58624640dc7230

commit 1ffea42ccdc14b7a6162c1895bd8f2aae48d5dae
Author: OpenStack Proposal Bot <email address hidden>
Date: Mon Apr 18 15:03:30 2016 +0000

    Updated from global requirements

    Change-Id: Icb27945b3f222af1d9ab2b62bf2169d82b6ae26c

commit b970ed5bdac60c0fa227f2fddaa9b842ba4f51a7
Author: Kevin Benton <email address hidden>
Date: Fri Apr 8 17:52:14 2016 -0700

    Clear DVR MAC on last agent deletion from host

    Once all agents are deleted from a host, the DVR MAC generated
    for that host should be deleted as well to prevent a buildup of
    pointless flows generated in the OVS agent for hosts that don't
    exist.

    Closes-Bug: #1568206
    Change-Id: I51e736aa0431980a595ecf810f148ca62d990d20
    (cherry picked from commit 92527c2de2afaf4862fddc101143e4d02858924d)

commit eee9e58ed258a48c69effef121f55fdaa5b68bd6
Author: Mike Bayer <email address hidden>
Date: Tue Feb 9 13:10:57 2016 -0500

    Add an option for WSGI pool size

    Neutron currently hardcodes the number of
    greenlets used to process requests in a process to 1000.
    As detailed in
    http://lists.openstack.org/pipermail/openstack-dev/2015-December/082717.html

    this can cause requests to wait within one process
    for available database connection while other processes
    remain available.

    By adding a wsgi_default_pool_size option functionally
    identical to that of Nova, we can lower the number of
    greenlets per process to be more in line with a typical
    max database connection pool size.

    DocImpact: a previously unused configuration value
               wsgi_default_pool_size is now used to a...

tags: added: neutron-proactive-backport-potential
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 7.1.0

This issue was fixed in the openstack/neutron 7.1.0 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 9.0.0.0b1

This issue was fixed in the openstack/neutron 9.0.0.0b1 development milestone.

tags: removed: neutron-proactive-backport-potential
no longer affects: neutron/kilo
tags: removed: liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 2015.1.4

This issue was fixed in the openstack/neutron 2015.1.4 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.