"Race in testr accounting" problem in some setups

Bug #1538941 reported by Viktor Tikkanen on 2016-01-28
22
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Testrepository
Undecided
Unassigned

Bug Description

We have a list of 217 tempest cases (https://git.opnfv.org/cgit/functest/tree/testcases/VIM/OpenStack/CI/custom_tests/test_list.txt) and running it in two different environments produces different results:

Enveronment 1: totally 32 workers, all of them are OK, number of executed test cases (210) is stable from one run to another:

======
Totals
======
Ran: 210 tests in 988.0000 sec.
- Passed: 197
- Skipped: 2
- Expected Fail: 0
- Unexpected Success: 0
- Failed: 11
Sum of execute time for each test: 4818.5592 sec.

==============
Worker Balance
==============
- Worker 0 (18 tests) => 0:11:52.268517
- Worker 1 (5 tests) => 0:01:59.785352
- Worker 2 (12 tests) => 0:02:43.456338
- Worker 3 (6 tests) => 0:03:02.244103
- Worker 4 (19 tests) => 0:02:50.514449
- Worker 5 (5 tests) => 0:02:04.843871
- Worker 6 (11 tests) => 0:15:55.007223
- Worker 7 (11 tests) => 0:07:28.073453
- Worker 8 (6 tests) => 0:12:11.691135
- Worker 9 (3 tests) => 0:01:15.679801
- Worker 10 (3 tests) => 0:01:46.050897
- Worker 11 (1 tests) => 0:00:00.822915
- Worker 12 (5 tests) => 0:00:57.501500
- Worker 13 (9 tests) => 0:00:46.020557
- Worker 14 (3 tests) => 0:00:00.111195
- Worker 15 (5 tests) => 0:00:46.658937
- Worker 16 (2 tests) => 0:00:00.342555
- Worker 17 (13 tests) => 0:00:01.659226
- Worker 18 (3 tests) => 0:01:43.447998
- Worker 19 (2 tests) => 0:01:20.727811
- Worker 20 (3 tests) => 0:00:46.915088
- Worker 21 (7 tests) => 0:07:34.499020
- Worker 22 (22 tests) => 0:07:24.567170
- Worker 23 (3 tests) => 0:01:09.975961
- Worker 24 (2 tests) => 0:01:24.426870
- Worker 25 (5 tests) => 0:01:18.959722
- Worker 26 (6 tests) => 0:02:38.261145
- Worker 27 (3 tests) => 0:07:36.964679
- Worker 28 (4 tests) => 0:01:47.150957
- Worker 29 (4 tests) => 0:01:33.964308
- Worker 30 (3 tests) => 0:01:36.616182
- Worker 31 (6 tests) => 0:00:16.525006
...
2016-01-24 23:24:21,157 - run_tempest - INFO - Results: {'timestart': '2016-01-2423:07:45.730074', 'duration': 995, 'tests': 210, 'failures': 11}

Environment 2: Number of executed test cases differs from one run to another:

run 1:
==============
Worker Balance
==============
- WARNING: missing Worker 0! Race in testr accounting.
- WARNING: missing Worker 1! Race in testr accounting.
- Worker 2 (15 tests) => 0:01:09.591629
- Worker 3 (8 tests) => 0:01:07.315242
- Worker 4 (22 tests) => 0:00:56.157448
- WARNING: missing Worker 5! Race in testr accounting.
- WARNING: missing Worker 6! Race in testr accounting.
- WARNING: missing Worker 7! Race in testr accounting.
- WARNING: missing Worker 8! Race in testr accounting.
- WARNING: missing Worker 9! Race in testr accounting.
- Worker 10 (9 tests) => 0:01:06.690923
- WARNING: missing Worker 11! Race in testr accounting.
- WARNING: missing Worker 12! Race in testr accounting.
- Worker 13 (13 tests) => 0:00:30.699308
- Worker 14 (6 tests) => 0:00:22.981585
...
2016-01-25 01:42:08,144 - run_tempest - INFO - Results: {'timestart': '2016-01-2501:40:38.411388', 'duration': 89, 'tests': 73, 'failures': 2}

(here the number of executed test cases (73) is ~35% of 210, number of "good" workers (6) is ~40%)

run 2:
==============
Worker Balance
==============
- Worker 0 (16 tests) => 0:00:36.391436
- Worker 1 (18 tests) => 0:00:39.390862
- Worker 2 (15 tests) => 0:01:19.611528
- Worker 3 (8 tests) => 0:01:12.879155
- Worker 4 (10 tests) => 0:00:43.166414
- Worker 5 (12 tests) => 0:00:59.633944
- WARNING: missing Worker 6! Race in testr accounting.
- WARNING: missing Worker 7! Race in testr accounting.
- Worker 8 (4 tests) => 0:00:40.581921
- Worker 9 (8 tests) => 0:00:54.930262
- WARNING: missing Worker 10! Race in testr accounting.
- Worker 11 (4 tests) => 0:02:35.655297
- WARNING: missing Worker 12! Race in testr accounting.
- WARNING: missing Worker 13! Race in testr accounting.
- Worker 14 (6 tests) => 0:00:33.211228
...
2016-01-23 06:37:29,371 - run_tempest - INFO - Results: {'timestart': '2016-01-2306:34:25.922772', 'duration': 183, 'tests': 121, 'failures': 29}

(here the number of executed test cases (121) is ~58% of 210, number of "good" workers (6) is ~67%)

It there some bug in testr which causes those "Race in testr accounting" problems? Do you know any workarounds (except excluding --parallel option)?

Used version is 0.0.20.final.0

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers