OpenCL-beignet EnqueueNDRangeKernel fails after multiple execution with error -5

Bug #1776867 reported by antal
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
beignet (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

I'm using ubuntu 18.04 amd64. The following code eats up the system memory and after the 22000th iteration the clEnqueueNDRangeKernel fails with error -5 (CL_OUT_OF_RESOURCES):

cl_int errorcode;
    cl_event event;
    size_t global_item_size = biases.row;
    for(int i = 0; i < 50000; i++)
    {
        errorcode = clSetKernelArg(this->testkernel, 0, sizeof(int), (void*)&input.row);
        errorcode |= clSetKernelArg(this->testkernel, 1, sizeof(cl_mem), (void *)&(weights.cl_mem_obj));
        errorcode |= clSetKernelArg(this->testkernel, 2, sizeof(cl_mem), (void *)&(input.cl_mem_obj));
        errorcode |= clSetKernelArg(this->testkernel, 3, sizeof(cl_mem), (void *)&(biases.cl_mem_obj));
        errorcode |= clSetKernelArg(this->testkernel, 4, sizeof(cl_mem), (void *)&(output.cl_mem_obj));
        errorcode |= clEnqueueNDRangeKernel(this->command_queue, this->testkernel, 1, NULL, &global_item_size, NULL, 0, NULL, &event);
        if(errorcode != CL_SUCCESS)
        {
            cerr << "failed to lauch the kernel " << errorcode << endl;
            throw exception();
        }
        //clFlush(this->command_queue);
        //clFinish(this->command_queue);
        clWaitForEvents(1, &event);
        cout << i << endl;
    }

The opencl kernel is:
__kernel void test(const int InpRow, const __global float* weights,
                                   const __global float* input, const __global float* biases, __global float *output)
{
    const int globalRow = get_global_id(0);

    float acc = 0.0f;
    for (int k=0; k<InpRow; k++) {
        acc += weights[globalRow*InpRow + k] * input[k];
    }

    output[globalRow] = acc + biases[globalRow];
}

This problem was not present in ubuntu 16.04.
CPU: i7-4790K
GPU: Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller
OpenCL lib: beignet-opencl-icd:amd64 1.3.2-2

Thank You in advance,
Andrej

Paul White (paulw2u)
affects: ubuntu → beignet (Ubuntu)
tags: added: bionic
removed: ubuntu18.04
Revision history for this message
Rebecca Palmer (rebecca-palmer) wrote :

Memory consumption rises over a run, by ~2GB by the failure point. The exact number of iterations before failure depends on the beignet version but not the matrix size.

git bisect finds that this was introduced by

https://cgit.freedesktop.org/beignet/commit/?id=7ae1517cfc373847f168ffb3e41b635861af19c7

(between 1.2.x and 1.3.x). However, this is a large commit and trying to revert it on 1.3.2 fails with conflicts, so this doesn't provide an immediate way to fix the problem.

As a workaround, it is probably possible to compile beignet 1.2.x from source in Ubuntu 16.04 (use LLVM/Clang 3.8), but I haven't tried this.

Changed in beignet (Ubuntu):
status: New → In Progress
Revision history for this message
Rebecca Palmer (rebecca-palmer) wrote :

The bug is technically in your code: as you never call clReleaseEvent (waiting on an event doesn't release it), you leak an event object per iteration. I suspect the beignet change exposed this bug by greatly increasing the effective size of an event object, but don't yet know how.

There is a C++ interface to OpenCL (opencl-clhpp-headers) which may make this easier.

Revision history for this message
antal (andrej1991) wrote :

Thank you Rebecca! It helped. But what changed after Ubuntu 16.04? It was not needed to put clReleaseEvent into the code in Ubuntu 16.04.

Br,
Andrej

Revision history for this message
Rebecca Palmer (rebecca-palmer) wrote :

It would have eventually, if you'd run more iterations: each iteration leaks ~450 bytes before https://cgit.freedesktop.org/beignet/commit/?id=7ae1517cfc373847f168ffb3e41b635861af19c7 but ~20,000 bytes after. I suspect the reason for this is that events no longer release exec_data.gpgpu on completion (cl_enqueue.c part of that commit), to allow profiling timestamps to be read from it.

If you want a RAII interface, use opencl-clhpp-headers. (Or in theory pyopencl, but bug 1354086 also looks like an event object leak...)

Changed in beignet (Ubuntu):
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.