Comment 1 for bug 1092050

Revision history for this message
Lars Butler (lars-butler) wrote :

Here are my testing notes. I ran eight different scenarios. In summary, increasing `concurrent_tasks` to 2* the number of worker processes and setting CELERY_ACKS_LATE = True, CELERYD_PREFETCH_MULTIPLIER = 1 in celeryconfig.py seems to solve the problem.

All tests use the same job configuration.
Tasks = 544

----------
Test 1
----------

Test:

All machines (272 cores).
`concurrent_tasks` = 320.

Result:

Result showed that bs04, gm01, and gm02
we under-utilized from the start of the calculation.

----------
Test 2
----------

Test:

Test run with only bs04 and gm0{1,2}.

Result:

Test shows full core utilization from the start (48, 48, and 48).

----------
Test 3
----------

Test:

All machines again, this time with `CELERYD_PREFETCH_MULTIPLIER = 1`.

Result:

Result was the same as Test 1; the 48-core machines are under-utilized.

----------
Test 4
----------

Test:

`CELERYD_PREFETCH_MULTIPLIER = 1` and also set the `concurrent_tasks` parameter in
openquake.cfg to 272 (down from 320) to match the number of workers.

Result:

This gave similar results to Tests 1 and 3, except the initial utilization of bs04
and gm0{1,2} was even worse: only 39 cores were used.

----------
Test 5
----------

Test:

`CELERYD_PREFETCH_MULTIPLIER = 1`, `concurrent_tasks` set to double the amount of cores
(272 * 2 = 544).

Result:

This gave full utilization from the start (48, 48, 48, 32, 32, 32, 32). Distribution of
work was pretty even throughout the entire calculation.

----------
Test 6
----------

Test:

Remove CELERYD_PREFETCH_MULTIPLIER, reset to default.
`concurrent_tasks = 544` (same as Test 5).

Result:

The result was about the same as Test 5. It seems that changing the
CELERYD_PREFETCH_MULTIPLIER doesn't make a different (at least with
the values used thus far).

----------
Test 7
----------

Test:

Increase task count to 3 * 272 = 816.
`concurrent_tasks = 544` (2 * 272)

Result:

The result was basically the same as Tests 5 and 6. I note that
the larger machines (bs04, gms) finished tasks quicker
and become idle still sooner than the gs machines. Probably we will
benefit from reducing the CELERYD_PREFETCH_MULTIPLIER to 1.

----------
Test 8
----------

Test:

Same as Test 1, but start workers in a different order (first the gs machines, then the
other 3).

Result:

No significant differences from Test 1.