Replicating a for-loop with barriers and an update of an out-of-loop variable produces invalid code
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
pocl |
Fix Released
|
Critical
|
Unassigned |
Bug Description
__attribute_
kernel
void test_kernel(
int gid = get_global_id(0);
float global_sum = 0.0f;
int i;
for (i=0; i < 512; ++i) {
float value = input[gid+i];
float product = value * i;
global_sum += product;
barrier(
}
global_sum *= 123;
result[gid] = global_sum;
}
This test case updates a function scope private variable in a for-loop. When replicating, the other than the first work item do not update a replicated global_sum but add always to a constant 0.0 which is the initializer in the global_sum.
pocl-standalone -h header.h -t tce -o pocl-output.bc test_case.cl
(I use TCE target only because reqd_work_
The replicated content of the for-loop is as follows:
for.body.
store i32 0, i32* @_local_id_x
store i32 0, i32* @_local_id_y
store i32 0, i32* @_local_id_z
%add = add nsw i32 %i.06, %add1.i
%arrayidx = getelementptr inbounds float addrspace(3)* %input, i32 %add
%8 = load float addrspace(3)* %arrayidx, align 4, !tbaa !2
%conv = sitofp i32 %i.06 to float
%mul = fmul float %8, %conv
%add1 = fadd float %global_sum.05, %mul
store i32 1, i32* @_local_id_x
store i32 0, i32* @_local_id_y
store i32 0, i32* @_local_id_z
%add.wi_1_0_0 = add nsw i32 0, %add1.i.wi_1_0_0
%arrayidx.
%9 = load float addrspace(3)* %arrayidx.wi_1_0_0, align 4, !tbaa !2
%conv.wi_1_0_0 = sitofp i32 0 to float
%mul.wi_1_0_0 = fmul float %9, %conv.wi_1_0_0
%add1.wi_1_0_0 = fadd float 0.000000e+00, %mul.wi_1_0_0
br label %for.body.
After running -O3 the code for the 2nd WI gets optimized as loop invariant code, it's moved outside the loop.
Related branches
Changed in pocl: | |
status: | New → Fix Released |
It doesn't help to force the initialization of global_sum to be outside the loop (to avoid which I thought was "constant propagation"). Still, the 2nd WI refers directly to the global_sum outside the loop, not a phi-defined variable (thus doesn't accumulate the sum but always overwrite it on every iteration). Seems the phi node for the 'global_sum' is not replicated correctly per WI.