Functional tests randomly failing due to Neutron DB errors

Bug #1808146 reported by Daniel Alvarez
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-ovn
Fix Released
High
Unassigned

Bug Description

The error below is very frequent in functional tests.
It looks like it could be some race condition with the SqlFixtures that use a single in-memory database [0]. Need to investigate as it's causing lots of failures in our gate lately.
Also, it's weird that the offending table is always 'qos_fip_policy_bindings'.

[0] http://git.openstack.org/cgit/openstack/neutron/tree/neutron/tests/unit/testlib_api.py#n146

ft2.1: networking_ovn.tests.functional.test_router.TestRouter.test_gateway_chassis_least_loaded_scheduler_StringException: Traceback (most recent call last):
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
sqlite3.OperationalError: no such table: qos_fip_policy_bindings

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/fixtures/fixture.py", line 125, in cleanUp
    return self._cleanups(raise_errors=raise_first)
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/fixtures/callmany.py", line 89, in __call__
    reraise(error[0], error[1], error[2])
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/testtools/_compat3x.py", line 16, in reraise
    raise exc_obj.with_traceback(exc_tb)
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/fixtures/callmany.py", line 83, in __call__
    cleanup(*args, **kwargs)
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/neutron/tests/unit/testlib_api.py", line 100, in <lambda>
    self.addCleanup(lambda: self._delete_from_schema(engine))
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/neutron/tests/unit/testlib_api.py", line 85, in _delete_from_schema
    conn.execute(table.delete())
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exception
    util.raise_from_cause(newraise, exc_info)
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 248, in reraise
    raise value.with_traceback(tb)
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/opt/stack/new/networking-ovn/.tox/dsvm-functional-py35/lib/python3.5/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
oslo_db.exception.DBNonExistentTable: (sqlite3.OperationalError) no such table: qos_fip_policy_bindings [SQL: 'DELETE FROM qos_fip_policy_bindings'] (Background on this error at: http://sqlalche.me/e/e3q8)

Revision history for this message
Daniel Alvarez (dalvarezs) wrote :

Sorry, it's not always the same table. Sometimes another tables like ml2_geneve_allocations are found missing.

Revision history for this message
Daniel Alvarez (dalvarezs) wrote :
Revision history for this message
Daniel Alvarez (dalvarezs) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/627191
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=649c7d7af9a23da46b44db344ad16b1b186ef3ca
Submitter: Zuul
Branch: master

commit 649c7d7af9a23da46b44db344ad16b1b186ef3ca
Author: Daniel Alvarez <email address hidden>
Date: Wed Dec 26 19:55:40 2018 +0100

    functional: Do not inherit from SqlTestCaseLight

    Inheriting from SqlTestCaseLight in some tests and from
    SqlTestCase in some others at the same time may cause errors
    in functional tests. According to comments in SqlTestCaseLight
    implementation in Neutron, that's intended for unit tests
    only. This patch is changing it so that all tests inherit
    from the same Sql Base class.

    For a long time, we've been hitting failures in gate due to
    race conditions in functional tests caused by this issue.

    Change-Id: Id9c917f604024f6c82b1ff638ba58bce1f2b306b
    Closes-Bug: #1808146
    Signed-off-by: Daniel Alvarez <email address hidden>

Changed in networking-ovn:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/627543

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (master)

Change abandoned by Lucas Alvares Gomes (<email address hidden>) on branch: master
Review: https://review.openstack.org/626589

Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

Re-opening, the error still happens sometimes in the gate

Changed in networking-ovn:
status: Fix Released → Confirmed
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-ovn (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/628157

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/628254

tags: added: networking-ovn-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (master)

Change abandoned by Lucas Alvares Gomes (<email address hidden>) on branch: master
Review: https://review.openstack.org/627190
Reason: abandoning, cleaning the gate queue

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Lucas Alvares Gomes (<email address hidden>) on branch: master
Review: https://review.openstack.org/628157
Reason: abandoning, cleaning queue

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Lucas Alvares Gomes (<email address hidden>) on branch: master
Review: https://review.openstack.org/630658

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/629248
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=29d07bec1694b05287aa175bc4de10a5dc04965e
Submitter: Zuul
Branch: master

commit 29d07bec1694b05287aa175bc4de10a5dc04965e
Author: Lucas Alvares Gomes <email address hidden>
Date: Tue Jan 8 16:04:31 2019 +0000

    Functional: Workaround database failures

    This patch inherited and modified the mehtod that creates the database
    for the functional tests to:

    1. Make it atomic by using a lock

    2. Removed the singleton nature of it. Prior to this patch, the tests
    kept a single shared in-memory database. Now, each test will have its
    own in-memory database.

    3. Handle the DBNonExistentTable for failures in the cleanUp() method
    while deleting the db tables.

    Partial-Bug: #1808146
    Change-Id: I90a50fe1284078f5a5719da54599d4cddc5bd06f
    Signed-off-by: Lucas Alvares Gomes <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (master)

Change abandoned by Lucas Alvares Gomes (<email address hidden>) on branch: master
Review: https://review.openstack.org/628254

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/632677

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (stable/rocky)

Reviewed: https://review.openstack.org/632677
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=abcf76499e056b848a96db85ec3931e8b61a3b56
Submitter: Zuul
Branch: stable/rocky

commit abcf76499e056b848a96db85ec3931e8b61a3b56
Author: Lucas Alvares Gomes <email address hidden>
Date: Tue Jan 8 16:04:31 2019 +0000

    Functional: Workaround database failures

    This patch inherited and modified the mehtod that creates the database
    for the functional tests to:

    1. Make it atomic by using a lock

    2. Removed the singleton nature of it. Prior to this patch, the tests
    kept a single shared in-memory database. Now, each test will have its
    own in-memory database.

    3. Handle the DBNonExistentTable for failures in the cleanUp() method
    while deleting the db tables.

    Partial-Bug: #1808146
    Change-Id: I90a50fe1284078f5a5719da54599d4cddc5bd06f
    Signed-off-by: Lucas Alvares Gomes <email address hidden>
    (cherry picked from commit 29d07bec1694b05287aa175bc4de10a5dc04965e)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (stable/rocky)

Change abandoned by Daniel Alvarez (<email address hidden>) on branch: stable/rocky
Review: https://review.openstack.org/627543

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 6.0.0.0b1

This issue was fixed in the openstack/networking-ovn 6.0.0.0b1 development milestone.

tags: removed: networking-ovn-proactive-backport-potential
Changed in networking-ovn:
status: Confirmed → Fix Released
Revision history for this message
yatin (yatinkarel) wrote :

Just for the record the same issue was seen in unit test job and the issue root caused to pool recycle after 1 hour https://bugs.launchpad.net/neutron/+bug/2024674.

The workaround https://review.openstack.org/632677 for functional test is already removed in master with https://review.opendev.org/c/openstack/neutron/+/874669

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.