Comment 4 for bug 1354086

Revision history for this message
Rebecca Palmer (rebecca-palmer) wrote :

There are actually three separate issues here, but as (a) is already known and (b) is not a bug, I define this bug to be (c).

To understand them, it is necessary to know that OpenCL computations are asynchronous: a clmath expression like "aCL=bCL+cCL" places this operation in a CommandQueue and returns without waiting for it to finish. (This is to allow the CPU to do other work during the GPU computation.)

(a) Running out of memory can hang the entire system, rather than ending just the OpenCL application with CL_OUT_OF_RESOURCES.

This is probably the same long-standing issue (e.g. bug 620074, bug 1504914, bug 1592813) that makes Linux out-of-memory conditions in general do this. (The integrated GPUs supported by beignet share the host's memory.)

(b) In both beignet and pocl (probably all ICDs), a long sequence of allocate/deallocate operations (e.g. clmath creating a new array each operation) *without* waiting for results uses up memory, but regularly waiting for results avoids this.

This is because allocating memory (clCreateBuffer) happens immediately, but the actual computations are queued, and memory can't be freed until the computations using it have finished. Hence, if many operations are queued without waiting for a result, memory allocation can run far ahead of computation, filling up the memory.

This is not a bug: don't do that. Either wait for results often enough that this doesn't build up to the point of running out of memory, or (better for performance) re-use existing memory objects instead of allocating/deallocating. (To do the latter with clmath, use pyopencl.tools.MemoryPool.)

While investigating this I discovered that all beignet queues are out-of-order execution even if the user requested in-order, which is a bug, but is not the cause of this issue.

(c) In beignet but not pocl, a long sequence of clmath operations leaks memory, even with regular waits.

To ensure that intermediate results are calculated before they are used, clmath arrays use Event objects to track dependencies. A beignet event includes references to the event(s) it depends on (https://sources.debian.org/src/beignet/1.3.2-2/src/cl_event.h/?hl=47#L40), and continues to hold these as long as the event object exists, even if it has completed and been waited for. As OpenCL objects are freed by reference counting, this means that as long as the last event in a dependency tree exists, the whole tree of (recursive) dependencies also exists, taking up memory (~20kB per event).

pocl avoids this by dropping these references after completion ( https://sources.debian.org/src/pocl/1.1-5/lib/CL/devices/common.c/?hl=722#L714 ); the attached patch makes beignet do so. Checking the source suggests mesa is also affected ( https://sources.debian.org/src/mesa/18.1.3-1/src/gallium/state_trackers/clover/core/event.hpp/?hl=84#L34 ), but I don't have the hardware to try it. (The OpenCL part of mesa is AMD/Radeon only.)