nova unit tests occasionally fail migration tests for mysql and postgres

Bug #1216851 reported by Mate Lakat
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Roman Podoliaka
OpenStack Core Infrastructure
Invalid
Low
Unassigned

Bug Description

http://logs.openstack.org/80/43580/1/check/gate-nova-python27/ae0cd74/console.html

013-08-26 09:18:25.324 | ======================================================================
2013-08-26 09:18:25.324 | FAIL: nova.tests.db.test_migrations.TestNovaMigrations.test_postgresql_opportunistically
2013-08-26 09:18:25.325 | tags: worker-0
2013-08-26 09:18:25.325 | ----------------------------------------------------------------------
...
2013-08-26 09:18:25.338 | 172 -> 173...
2013-08-26 09:18:25.339 | Deleted duplicated row with id: 2 from table: key_pairs
2013-08-26 09:18:25.339 | Deleted duplicated row with id: 1 from table: key_pairs
2013-08-26 09:18:25.339 | done
2013-08-26 09:18:25.339 | Failed to migrate to version 173 on engine Engine(postgresql+psycopg2://openstack_citest:openstack_citest@localhost/openstack_citest)
2013-08-26 09:18:25.339 | }}}
2013-08-26 09:18:25.339 |
2013-08-26 09:18:25.340 | Traceback (most recent call last):
2013-08-26 09:18:25.340 | File "/home/jenkins/workspace/gate-nova-python27/nova/tests/db/test_migrations.py", line 175, in test_postgresql_opportunistically
2013-08-26 09:18:25.340 | self._test_postgresql_opportunistically()
2013-08-26 09:18:25.340 | File "/home/jenkins/workspace/gate-nova-python27/nova/tests/db/test_migrations.py", line 358, in _test_postgresql_opportunistically
2013-08-26 09:18:25.340 | self._walk_versions(engine, False, False)
2013-08-26 09:18:25.340 | File "/home/jenkins/workspace/gate-nova-python27/nova/tests/db/test_migrations.py", line 379, in _walk_versions
2013-08-26 09:18:25.341 | self._migrate_up(engine, version, with_data=True)
2013-08-26 09:18:25.341 | File "/home/jenkins/workspace/gate-nova-python27/nova/tests/db/test_migrations.py", line 444, in _migrate_up
2013-08-26 09:18:25.341 | check(engine, data)
2013-08-26 09:18:25.341 | File "/home/jenkins/workspace/gate-nova-python27/nova/tests/db/test_migrations.py", line 1233, in _check_173
2013-08-26 09:18:25.341 | insert.execute(duplicate_keypair)
2013-08-26 09:18:25.341 | File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/sqlalchemy/sql/expression.py", line 2841, in execute
2013-08-26 09:18:25.341 | return e._execute_clauseelement(self, multiparams, params)
2013-08-26 09:18:25.342 | File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2453, in _execute_clauseelement
2013-08-26 09:18:25.342 | return connection._execute_clauseelement(elem, multiparams, params)
2013-08-26 09:23:49.057 | File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1584, in _execute_clauseelement
2013-08-26 09:23:49.057 | compiled_sql, distilled_params
2013-08-26 09:23:49.057 | File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1732, in _execute_context
2013-08-26 09:23:49.057 | self._commit_impl()
2013-08-26 09:23:49.057 | File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1280, in _commit_impl
2013-08-26 09:23:49.057 | self._handle_dbapi_exception(e, None, None, None, None)
2013-08-26 09:23:49.058 | File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1828, in _handle_dbapi_exception
2013-08-26 09:23:49.058 | self._autorollback()
2013-08-26 09:23:49.058 | File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1345, in _autorollback
2013-08-26 09:23:49.058 | self._rollback_impl()
2013-08-26 09:23:49.058 | File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1265, in _rollback_impl
2013-08-26 09:23:49.058 | self._handle_dbapi_exception(e, None, None, None, None)
2013-08-26 09:23:49.058 | File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1262, in _rollback_impl
2013-08-26 09:23:49.059 | self.engine.dialect.do_rollback(self.connection)
2013-08-26 09:23:49.059 | File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 300, in do_rollback
2013-08-26 09:23:49.059 | connection.rollback()
2013-08-26 09:23:49.059 | InterfaceError: (InterfaceError) connection already closed None None

Tags: db
Revision history for this message
Clark Boylan (cboylan) wrote :

Without more info about what this test was trying to accomplish and whether or not the change that caused the failure could be at fault it is hard to debug this further. It looks like during a database migration test the connection to postgres was closed prematurely. Is it possible that one of the migrations was flawed causing this to stop early or is it more likely that there was a network hickup on the test slave? In any case more info necessary. I have also added nova to the bug to hopefully get eyes on this from people that understand the database migration tests.

Changed in openstack-ci:
status: New → Incomplete
importance: Undecided → Low
melanie witt (melwitt)
tags: added: db
Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in nova:
status: New → Confirmed
Joe Gordon (jogo)
summary: - postgresql migration fail: connection already closed
+ nova unit tests occasionally fail migration tests for mysql and postgres
Revision history for this message
Joe Gordon (jogo) wrote :

Although we are definitely seeing this, the logstash query for this is a bit misleading, because some of the failures are in checks where the unit tests are working as expected.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Some investigation was performed to rule out issues with the machines where these tests run. One machine where a failure had been witnessed for the nova py27 tests in the gate pipeline (timing out both mysql and postgres migrations) showed nothing out of the ordinary with the system. When running a subsequent instance of the same job, it succeeded and system load was less than the number of vCPUs on that VM. The database logs are littered with all manner of critical/fatal errors, but I have no idea which ones are considered "normal" for these tests and which could indicate the test accidentally doing something wrong (rather than purposefully).

Revision history for this message
Joe Gordon (jogo) wrote :

We are seeing this in the stable branch of nova as well http://logs.openstack.org/02/56202/1/gate/gate-nova-python27/648edba/console.html

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Hmm, this seems to be an ordinary timeout exception that is raised if a test is executed for more than OS_TEST_TIMEOUT seconds (https://github.com/openstack/nova/blob/master/.testr.conf). And in case of migrations tests error messages can be very-very misleading. E.g. I can't run migrations tests in Nova on my machine without specifying OS_TEST_TIMEOUT=0, because it takes more than 160 seconds to tun test_walk_versions() for MySQL.

The more migrations we put to Nova, the more often we face this issue (depending on the load of CI nodes of course).

So I would suggest to increase the OS_TEST_TIMEOUT value (or turn it off for migrations tests, if it's possible)

Revision history for this message
Kurt Taylor (krtaylor) wrote :

The migration timeout problems appear to be intermittent, I have had this problem but it went away with a different configuration.

If so, this may be similar to:
https://bugs.launchpad.net/nova/+bug/1105284
and
https://bugs.launchpad.net/nova/+bug/1200289

Revision history for this message
Clark Boylan (cboylan) wrote :

If the test timeout is causing the problems my suggestion would be to bump the timeout in nova's testr.conf then work on having different timeouts for different tests (maybe use multiples of some base timeout?). This should buy enough time to sort out the problem more properly.

The timeouts are important particularly when running on jenkins so that we can fail individual tests as quickly as possibly without waiting for the much longer Jenkins job timeouts.

Revision history for this message
Joe Gordon (jogo) wrote :
Changed in openstack-ci:
status: Incomplete → Invalid
Changed in nova:
assignee: nobody → Roman Podoliaka (rpodolyaka)
Changed in nova:
status: Confirmed → In Progress
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/60020
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0be2ee14211766dc50d504935e198d3f0a6527a8
Submitter: Jenkins
Branch: master

commit 0be2ee14211766dc50d504935e198d3f0a6527a8
Author: Roman Podoliaka <email address hidden>
Date: Wed Dec 4 18:13:50 2013 +0200

    Make it possible to override test timeout value

    Currently, we are setting the test timeout value globally
    in .testr.conf. It could be useful to be able to override
    it per test case (e.g. for migrations tests which usually
    take much longer to run than any other tests).

    Related-Bug: #1216851

    Change-Id: Icee09609f76f4439f6d981641be521d8d695a77e

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/60021
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b73c64dbe41881096873f040df1e2e4da3227580
Submitter: Jenkins
Branch: master

commit b73c64dbe41881096873f040df1e2e4da3227580
Author: Roman Podoliaka <email address hidden>
Date: Wed Dec 4 18:14:08 2013 +0200

    Give migrations tests more time to run

    Migrations tests usually take longer to run that most other
    tests. Increase the time out value just for migrations tests,
    so that we don't hit various time out errors running unit tests
    when CI test nodes slow down a bit.

    Closes-Bug: #1216851

    Change-Id: I48d7c125397cd282d52eb8e1a1ccb699188d8f14

Changed in nova:
milestone: none → icehouse-2
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-2 → 2014.1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/370805

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/370805
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=71d6333f855e139894f497fc120487895a1d66ce
Submitter: Jenkins
Branch: master

commit 71d6333f855e139894f497fc120487895a1d66ce
Author: Roman Podoliaka <email address hidden>
Date: Thu Sep 15 16:10:25 2016 +0300

    Set a bigger TIMEOUT_SCALING_FACTOR value for migration tests

    This should help avoid timeouts on CI.

    Related-Bug: #1216851

    Change-Id: Ic5b2b75e1d8bc73e49b8e7944ae802a64041cb7c

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.