Wednesday, 23 September 2015

Setting kernel constants using CUDA driver API

CUDA uses special qualifiers (namely, __constant__) to tell to the compiler that some variables should be placed direcly in a cached memory banks. The content of such variables (e.g., numeric values, constant arrays) can be set by the host before a kernel is executed. This is achieved, with CUDA runtime API, by using the following statement:

cudaMemcpyToSymbol(symbol, &variable, sizeof(variable));

Piace of cake. But how does it work with the CUDA driver API? Well, it is not clearly explained in the programming guide (at least, I couldn't find any examples). It turns out that the process requires two steps:
  1. get a reference to the __constant__ variable in the device;
  2. initialize it using a memcpy "Host to Device".
In practice, it works as follows:

CUdeviceptr constDevPtr;
cuModuleGetGlobal(&constDevPtr, NULL, module, "symbol");

cuMemcpyHtoD(constDevPtr, &variable, sizeof(variable));

I tested the code above and it works seemlessly. I hope this will save some headaches to CUDA developers.