[i5-3230] Tight pyopencl.clmath loops cause out-of-memory system hang

Bug #1354086 reported by Rebecca Palmer on 2014-08-07
This bug affects 1 person
Affects Status Importance Assigned to Milestone
beignet (Ubuntu)
mesa (Ubuntu)
pyopencl (Ubuntu)

Bug Description

In beignet (not pocl), tight loops involving OpenCL array creation and destruction, eg. repeated bCL=aCL+bCL (or other pyopencl.clmath operations) or repeated pyopencl.enqueue_copy(cq0,bCL.data,aCL.data), often hang the whole system, after a number of operations consistent with memory exhaustion.

As waiting for queued operations to finish (pyopencl.enqueue_barrier(cq0).wait()) before attempting more avoids the bug, but dependencies between the operations (as in the bCL=aCL+bCL example) do not, this is probably a result of the "allocate memory" step being separate from, and faster than, the "do the operation" step, so being able to run ahead until it uses up all the memory.

(Note that while the above wait() can be used as a workaround for this bug, it is usually faster to avoid frequent memory allocation altogether, by reusing existing arrays; for pyopencl.clmath, this means using pyopencl.tools.MemoryPool.)

Rebecca Palmer (rebecca-palmer) wrote :
description: updated
Rebecca Palmer (rebecca-palmer) wrote :

Further testing found this to occur only after many clmath operations, with the number required being fairly but not exactly repeatable, more for smaller arrays, and being reset by exiting and restarting Python but not by gc.collect(); this suggests a memory leak, so the problem could be in either pyopencl or beignet. Upstream (2014.1) pyopencl is also affected.

Sometimes a brief hang is followed by a crash with the message
python: /tmp/buildd/beignet-0.9.3git/src/intel/intel_gpgpu.c:567: intel_gpgpu_check_binded_buf_address: Assertion `gpgpu->binded_buf[i]->offset != 0' failed.
but more often it is an indefinite hang with no message.

Rebecca Palmer (rebecca-palmer) wrote :
description: updated
summary: - [i5-3230] Crash/hang/graphical artifacts in pyopencl
+ [i5-3230] Tight pyopencl.clmath loops cause out-of-memory system hang
Rebecca Palmer (rebecca-palmer) wrote :

There are actually three separate issues here, but as (a) is already known and (b) is not a bug, I define this bug to be (c).

To understand them, it is necessary to know that OpenCL computations are asynchronous: a clmath expression like "aCL=bCL+cCL" places this operation in a CommandQueue and returns without waiting for it to finish. (This is to allow the CPU to do other work during the GPU computation.)

(a) Running out of memory can hang the entire system, rather than ending just the OpenCL application with CL_OUT_OF_RESOURCES.

This is probably the same long-standing issue (e.g. bug 620074, bug 1504914, bug 1592813) that makes Linux out-of-memory conditions in general do this. (The integrated GPUs supported by beignet share the host's memory.)

(b) In both beignet and pocl (probably all ICDs), a long sequence of allocate/deallocate operations (e.g. clmath creating a new array each operation) *without* waiting for results uses up memory, but regularly waiting for results avoids this.

This is because allocating memory (clCreateBuffer) happens immediately, but the actual computations are queued, and memory can't be freed until the computations using it have finished. Hence, if many operations are queued without waiting for a result, memory allocation can run far ahead of computation, filling up the memory.

This is not a bug: don't do that. Either wait for results often enough that this doesn't build up to the point of running out of memory, or (better for performance) re-use existing memory objects instead of allocating/deallocating. (To do the latter with clmath, use pyopencl.tools.MemoryPool.)

While investigating this I discovered that all beignet queues are out-of-order execution even if the user requested in-order, which is a bug, but is not the cause of this issue.

(c) In beignet but not pocl, a long sequence of clmath operations leaks memory, even with regular waits.

To ensure that intermediate results are calculated before they are used, clmath arrays use Event objects to track dependencies. A beignet event includes references to the event(s) it depends on (https://sources.debian.org/src/beignet/1.3.2-2/src/cl_event.h/?hl=47#L40), and continues to hold these as long as the event object exists, even if it has completed and been waited for. As OpenCL objects are freed by reference counting, this means that as long as the last event in a dependency tree exists, the whole tree of (recursive) dependencies also exists, taking up memory (~20kB per event).

pocl avoids this by dropping these references after completion ( https://sources.debian.org/src/pocl/1.1-5/lib/CL/devices/common.c/?hl=722#L714 ); the attached patch makes beignet do so. Checking the source suggests mesa is also affected ( https://sources.debian.org/src/mesa/18.1.3-1/src/gallium/state_trackers/clover/core/event.hpp/?hl=84#L34 ), but I don't have the hardware to try it. (The OpenCL part of mesa is AMD/Radeon only.)

Changed in pyopencl (Ubuntu):
status: New → Invalid
Rebecca Palmer (rebecca-palmer) wrote :
Changed in beignet (Ubuntu):
status: New → In Progress

The attachment "eventchain-memory-leak.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package beignet - 1.3.2-3

beignet (1.3.2-3) unstable; urgency=medium

  * Fix memory leak on long event chains. (LP: #1354086)
  * Fix FTBFS with LLVM 6.
  * Use LLVM 6 on amd64/i386 and 4 on x32. (Closes: #904279)
  * Allow clCreateCommandQueue to create out-of-order queues.
  * Bump Standards-Version to 4.1.5 (no changes needed).
  * Update cl_accelerator_intel.patch.
  * Bump debhelper compat to 11.
  * Add autopkgtests (skipped in standard setup due to hardware
    requirements - see README.source).
  * Reduce error spew on (partly) unsupported hardware.
  * Enable Coffee Lake hardware support.
  * Update documentation.

 -- Rebecca N. Palmer <email address hidden> Wed, 25 Jul 2018 21:17:28 +0100

Changed in beignet (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers