Ceilometer

Jenkins test run should not run parallel

Bug #1213943 reported by Tong Li on 2013-08-19

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Ceilometer	Invalid	Undecided	Tong Li

Bug Description

Since some of the test cases based on the database operation were not developed to run at the same time, Jenkins test run fails.
This can be easily spotted by looking at the number of workers from the log file. We either need to develop better test cases so
that when all test cases share one database, they won't affect each other or we simply at this point just need to make sure that
there is only 1 worker to test our code. The later approach obviously slow down the process but should be the easiest approach.

For some reason, the nova tests ensures there is only 1 worker but not for the other tests
python setup.py testr --slowest '--testr-args=--concurrency=1 --here=nova_tests '
Then we have
python setup.py testr --slowest --testr-args=

The follow log shows there are multiple workers for each run. See worker-0 & worker-1 from the tag.

tags: -worker-0
time: 2013-08-16 22:50:27.332786Z
tags: worker-1
test: tests.alarm.test_threshold_evaluation.TestEvaluate.test_simple_insufficient
time: 2013-08-16 22:50:27.364765Z
successful: tests.alarm.test_threshold_evaluation.TestEvaluate.test_simple_insufficient [ multipart
Content-Type: text/plain;charset="utf8"
pythonlogging:''
106
initiating evaluation cycle on 2 alarms
alarm 5fc6a2cb-e1b1-44eb-9897-7f05a9cc5122 transitioning to insufficient data because 5 datapoints are unknown
alarm 0078ef8d-947b-48d9-a4c1-795c8c12988b transitioning to insufficient data because 4 datapoints are unknown
0
]
tags: -worker-1
time: 2013-08-16 22:50:27.366113Z
tags: worker-0
test: tests.api.v1.test_compute_duration_by_resource_scenarios.TestComputeDurationByResource.test_after_range(mongodb)
time: 2013-08-16 22:50:27.387948Z
skip: tests.api.v1.test_compute_duration_by_resource_scenarios.TestComputeDurationByResource.test_after_range(mongodb) [ multipart
Content-Type: text/plain;charset="utf8"
pythonlogging:''

Tong Li (litong01) on 2013-08-19

Changed in ceilometer:
assignee:	nobody → Tong Li (litong01)

Revision history for this message

Julien Danjou (jdanjou) wrote on 2013-08-19:

The tests pass so obviously everything's fine. I don't understand what's the problem you're trying to point.

Changed in ceilometer:
status:	New → Incomplete

Revision history for this message

Tong Li (litong01) wrote on 2013-08-19:

                'instance',
            sample.TYPE_CUMULATIVE,
            unit='',
            volume=1,
            user_id='user-id',
            project_id='project-id',
            resource_id='resource-id',
            timestamp=datetime.datetime(2012, 7, 2, 10, 39),
            resource_metadata={'display_name': 'test-server',
                               'tag': 'self.counter',
                               },
            source='test-1',

msg1 has the following values:

            'instance',
            sample.TYPE_CUMULATIVE,
            unit='',
            volume=1,
            user_id='user-id',
            project_id='project-id',
            resource_id='resource-id',
            timestamp=datetime.datetime(*timestamp_list[0]),
            resource_metadata={'display_name': 'test-server',
                               'tag': 'self.counter',
                               },
            source='test-1',

Both has same user_id, resource_id, but different timestamp, now look at test case in storage/base.py, and
this test case, test_get_samples_by_resource,

        f = storage.SampleFilter(user='user-id', resource='resource-id')
        results = list(self.conn.get_samples(f))
        assert results
        meter = results[1]
        assert meter is not None
        self.assertEqual(meter.as_dict(), self.msg0)

The query assumes that there will be at least two docs returned, and the second one will match msg0 but when you have
multiple test cases running at the same time, you may have more documents added in database and the second one returned
may not be the one which can match msg0. This problem exists in multiple test cases. Since there are indexes on the collection,
when the test case runs on its own, it will work fine. now when you have multiple running at the same time, it is certain that the
document returned as the second in the list won't be the one expected, thus the test fails. This is not the problem just for this
test case, many test cases made such assumptions.

Please take a look at method prepare_data of DBTestBase class.  It will record bunch of meters against same database. Now when
you run the test cases parallel, the query according to some condition will not return the expected data. For example, 
msg0 has the following values.
   
                'instance',
            sample.TYPE_CUMULATIVE,
            unit='',
            volume=1,
            user_id='user-id',
            project_id='project-id',
            resource_id='resource-id',
            timestamp=datetime.datetime(2012, 7, 2, 10, 39),
            resource_metadata={'display_name': 'test-server',
                               'tag': 'self.counter',
                               },
            source='test-1',

msg1 has the following values:

Both has same user_id, resource_id, but different timestamp, now look at test case in storage/base.py, and 
this test case, test_get_samples_by_resource,

The query assumes that there will be at least two docs returned, and the second one will match msg0 but when you have
multiple test cases running at the same time, you may have more documents added in database and the second one returned
may not be the one which can match msg0. This problem exists in multiple test cases.  Since there are indexes on the collection,
when the test case runs on its own, it will work fine. now when you have multiple running at the same time, it is certain that the
document returned as the second in the list won't be the one expected, thus the test fails. This is not the problem just for this
test case, many test cases made such assumptions.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-08-19: Fix proposed to ceilometer (master)

Fix proposed to branch: master
Review: https://review.openstack.org/42670

Changed in ceilometer:
status:	Incomplete → In Progress

Revision history for this message

Julien Danjou (jdanjou) wrote on 2013-08-19:

No test ever failed for any of these reasons in Jenkins, so I don't understand how you can state that there's a problem.

Revision history for this message

Tong Li (litong01) wrote on 2013-08-19:

There are test cases failed on Jenkins exactly because of this reason.

Here comes the log file, the reason it failed is exactly because of this reason.

ft91.13: tests.storage.test_impl_db2.RawSampleTest.test_get_samples_by_resource_StringException: pythonlogging:'': {{{connecting to DB2 on mongodb://localhost:29000/ceilometer}}}

Traceback (most recent call last):
  File "/home/jenkins/workspace/gate-ceilometer-python26/tests/storage/base.py", line 508, in test_get_samples_by_resource
    self.assertEqual(meter.as_dict(), self.msg0)
  File "/home/jenkins/workspace/gate-ceilometer-python26/.tox/py26/lib/python2.6/site-packages/testtools/testcase.py", line 322, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/home/jenkins/workspace/gate-ceilometer-python26/.tox/py26/lib/python2.6/site-packages/testtools/testcase.py", line 417, in assertThat
    raise MismatchError(matchee, matcher, mismatch, verbose)
MismatchError: !=:
reference = {'counter_name': u'instance',
'counter_type': u'cumulative',
'counter_unit': u'',
'counter_volume': 1,
'message_id': u'b3b81d04-06b4-11e3-96d5-bc764e045cea',
'message_signature': u'8c272b94f588d9097785a10f4b92bff661a55863594b0887187b4784c786a8ec',
'project_id': u'project-id',
'resource_id': u'resource-id',
'resource_metadata': {u'display_name': u'test-server',
                       u'tag': u'self.counter'},
'source': u'test-1',
'timestamp': datetime.datetime(2012, 7, 2, 10, 40),
'user_id': u'user-id'}
actual = {'counter_name': 'instance',
'counter_type': 'cumulative',
'counter_unit': '',
'counter_volume': 1,
'message_id': 'b3b7bce2-06b4-11e3-846a-bc764e045cea',
'message_signature': '34b40b5afe554e4070f5dd95bbbe78ebe9f7a6d511a07d745079d8f20bbef983',
'project_id': 'project-id',
'resource_id': 'resource-id',
'resource_metadata': {'display_name': 'test-server', 'tag': 'self.counter'},
'source': 'test-1',
'timestamp': datetime.datetime(2012, 7, 2, 10, 39),
'user_id': 'user-id'}

There are test cases failed on Jenkins exactly because of this reason.

Here comes the log file, the reason it failed is exactly because of this reason.

ft91.13: tests.storage.test_impl_db2.RawSampleTest.test_get_samples_by_resource_StringException: pythonlogging:'': {{{connecting to DB2 on mongodb://localhost:29000/ceilometer}}}

Traceback (most recent call last):
  File "/home/jenkins/workspace/gate-ceilometer-python26/tests/storage/base.py", line 508, in test_get_samples_by_resource
    self.assertEqual(meter.as_dict(), self.msg0)
  File "/home/jenkins/workspace/gate-ceilometer-python26/.tox/py26/lib/python2.6/site-packages/testtools/testcase.py", line 322, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/home/jenkins/workspace/gate-ceilometer-python26/.tox/py26/lib/python2.6/site-packages/testtools/testcase.py", line 417, in assertThat
    raise MismatchError(matchee, matcher, mismatch, verbose)
MismatchError: !=:
reference = {'counter_name': u'instance',
 'counter_type': u'cumulative',
 'counter_unit': u'',
 'counter_volume': 1,
 'message_id': u'b3b81d04-06b4-11e3-96d5-bc764e045cea',
 'message_signature': u'8c272b94f588d9097785a10f4b92bff661a55863594b0887187b4784c786a8ec',
 'project_id': u'project-id',
 'resource_id': u'resource-id',
 'resource_metadata': {u'display_name': u'test-server',
                       u'tag': u'self.counter'},
 'source': u'test-1',
 'timestamp': datetime.datetime(2012, 7, 2, 10, 40),
 'user_id': u'user-id'}
actual    = {'counter_name': 'instance',
 'counter_type': 'cumulative',
 'counter_unit': '',
 'counter_volume': 1,
 'message_id': 'b3b7bce2-06b4-11e3-846a-bc764e045cea',
 'message_signature': '34b40b5afe554e4070f5dd95bbbe78ebe9f7a6d511a07d745079d8f20bbef983',
 'project_id': 'project-id',
 'resource_id': 'resource-id',
 'resource_metadata': {'display_name': 'test-server', 'tag': 'self.counter'},
 'source': 'test-1',
 'timestamp': datetime.datetime(2012, 7, 2, 10, 39),
 'user_id': 'user-id'}

Revision history for this message

Tong Li (litong01) wrote on 2013-08-19:

The key is that the test cases were not developed multiple process (thread) safe. When they run at the same time against same database, the results are not exactly what each test case expected. Why this is so difficult to understand? Please see the previous post on where and why it will fail.

Revision history for this message

Tong Li (litong01) wrote on 2013-08-19:

mongodb uses fake connection object for creating an instance per test case, so that each test case already uses different
database instance, thus there is no isolation problem.

Changed in ceilometer:
status:	In Progress → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.