Comment 2 for bug 905846

Revision history for this message
Erik Schnetter (schnetter) wrote :

One of the issues I am fighting in my application (solving the Einstein equations) is the limited instruction cache size. We have to split our kernel into to or more (incurring some code duplication), and we are also using some functions explicitly marked "noinline" to reduce code size.

We currently do this in C++, and I want to port this code to OpenCL. Unconditional inlining of all functions would not be good for this application. Would it be possible to skip functions that don't call a get_*() function, or to skip inlining functions marked "noinline"?

Instead of privatizing the code for each thread, is it possible to privatize these variables on which the get_*() functions are based? With hyperthreading or modern AMD processors, it can be beneficial to have several threads executing the same code, even if some expressions cannot be evaluated at build time.