"Race in testr accounting" problem in some setups

Bug #1538941 reported by Viktor Tikkanen
22
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Testrepository
New
Undecided
Unassigned

Bug Description

We have a list of 217 tempest cases (https://git.opnfv.org/cgit/functest/tree/testcases/VIM/OpenStack/CI/custom_tests/test_list.txt) and running it in two different environments produces different results:

Enveronment 1: totally 32 workers, all of them are OK, number of executed test cases (210) is stable from one run to another:

======
Totals
======
Ran: 210 tests in 988.0000 sec.
- Passed: 197
- Skipped: 2
- Expected Fail: 0
- Unexpected Success: 0
- Failed: 11
Sum of execute time for each test: 4818.5592 sec.

==============
Worker Balance
==============
- Worker 0 (18 tests) => 0:11:52.268517
- Worker 1 (5 tests) => 0:01:59.785352
- Worker 2 (12 tests) => 0:02:43.456338
- Worker 3 (6 tests) => 0:03:02.244103
- Worker 4 (19 tests) => 0:02:50.514449
- Worker 5 (5 tests) => 0:02:04.843871
- Worker 6 (11 tests) => 0:15:55.007223
- Worker 7 (11 tests) => 0:07:28.073453
- Worker 8 (6 tests) => 0:12:11.691135
- Worker 9 (3 tests) => 0:01:15.679801
- Worker 10 (3 tests) => 0:01:46.050897
- Worker 11 (1 tests) => 0:00:00.822915
- Worker 12 (5 tests) => 0:00:57.501500
- Worker 13 (9 tests) => 0:00:46.020557
- Worker 14 (3 tests) => 0:00:00.111195
- Worker 15 (5 tests) => 0:00:46.658937
- Worker 16 (2 tests) => 0:00:00.342555
- Worker 17 (13 tests) => 0:00:01.659226
- Worker 18 (3 tests) => 0:01:43.447998
- Worker 19 (2 tests) => 0:01:20.727811
- Worker 20 (3 tests) => 0:00:46.915088
- Worker 21 (7 tests) => 0:07:34.499020
- Worker 22 (22 tests) => 0:07:24.567170
- Worker 23 (3 tests) => 0:01:09.975961
- Worker 24 (2 tests) => 0:01:24.426870
- Worker 25 (5 tests) => 0:01:18.959722
- Worker 26 (6 tests) => 0:02:38.261145
- Worker 27 (3 tests) => 0:07:36.964679
- Worker 28 (4 tests) => 0:01:47.150957
- Worker 29 (4 tests) => 0:01:33.964308
- Worker 30 (3 tests) => 0:01:36.616182
- Worker 31 (6 tests) => 0:00:16.525006
...
2016-01-24 23:24:21,157 - run_tempest - INFO - Results: {'timestart': '2016-01-2423:07:45.730074', 'duration': 995, 'tests': 210, 'failures': 11}

Environment 2: Number of executed test cases differs from one run to another:

run 1:
==============
Worker Balance
==============
- WARNING: missing Worker 0! Race in testr accounting.
- WARNING: missing Worker 1! Race in testr accounting.
- Worker 2 (15 tests) => 0:01:09.591629
- Worker 3 (8 tests) => 0:01:07.315242
- Worker 4 (22 tests) => 0:00:56.157448
- WARNING: missing Worker 5! Race in testr accounting.
- WARNING: missing Worker 6! Race in testr accounting.
- WARNING: missing Worker 7! Race in testr accounting.
- WARNING: missing Worker 8! Race in testr accounting.
- WARNING: missing Worker 9! Race in testr accounting.
- Worker 10 (9 tests) => 0:01:06.690923
- WARNING: missing Worker 11! Race in testr accounting.
- WARNING: missing Worker 12! Race in testr accounting.
- Worker 13 (13 tests) => 0:00:30.699308
- Worker 14 (6 tests) => 0:00:22.981585
...
2016-01-25 01:42:08,144 - run_tempest - INFO - Results: {'timestart': '2016-01-2501:40:38.411388', 'duration': 89, 'tests': 73, 'failures': 2}

(here the number of executed test cases (73) is ~35% of 210, number of "good" workers (6) is ~40%)

run 2:
==============
Worker Balance
==============
- Worker 0 (16 tests) => 0:00:36.391436
- Worker 1 (18 tests) => 0:00:39.390862
- Worker 2 (15 tests) => 0:01:19.611528
- Worker 3 (8 tests) => 0:01:12.879155
- Worker 4 (10 tests) => 0:00:43.166414
- Worker 5 (12 tests) => 0:00:59.633944
- WARNING: missing Worker 6! Race in testr accounting.
- WARNING: missing Worker 7! Race in testr accounting.
- Worker 8 (4 tests) => 0:00:40.581921
- Worker 9 (8 tests) => 0:00:54.930262
- WARNING: missing Worker 10! Race in testr accounting.
- Worker 11 (4 tests) => 0:02:35.655297
- WARNING: missing Worker 12! Race in testr accounting.
- WARNING: missing Worker 13! Race in testr accounting.
- Worker 14 (6 tests) => 0:00:33.211228
...
2016-01-23 06:37:29,371 - run_tempest - INFO - Results: {'timestart': '2016-01-2306:34:25.922772', 'duration': 183, 'tests': 121, 'failures': 29}

(here the number of executed test cases (121) is ~58% of 210, number of "good" workers (6) is ~67%)

It there some bug in testr which causes those "Race in testr accounting" problems? Do you know any workarounds (except excluding --parallel option)?

Used version is 0.0.20.final.0

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.