The naive solution of create and pour them into GPU's global memory is a bad idea, because of the huge bandwidth that would be wasted. Therefore, it takes a device-bound generator. Nvidia's CURAND library makes it easy to generate random numbers directly inside CUDA kernels.
A bit of theory...
Obviously, it is impossible to generate really random numbers on a deterministic machine. Any random function just applies some kind of transformation on another number, determining a succession that looks like a random one. Anyway, two successions starting from the same number (the seed) will be completely identical. If our kernels start from the same seed, regardless the algorithm, they will produce the same results. They must be different. Moreover, you have to store the previous numbers (let's call them global states) in order to produce the next ones during the following execution of each kernel. Fortunately, CUDA allows the storage and update of our partials directly inside GPU's memory.
...and some sample code
I post the example code - comments will follow.
__global__ void setup_kernel ( curandState * state, unsigned long seed )
{
int id = threadIdx.x;
curand_init ( seed, id, 0, &state[id] );
}
__global__ void generate( curandState* globalState )
{
int ind = threadIdx.x;
curandState localState = globalState[ind];
float RANDOM = curand_uniform( &localState );
globalState[ind] = localState;
}
int main( int argc, char** argv)
{
dim3 tpb(N,1,1);
curandState* devStates;
cudaMalloc ( &devStates, N*sizeof( curandState ) );
// setup seeds
setup_kernel <<< 1, tpb >>> ( devStates, time(NULL) );
// generate random numbers
generate <<< 1, tpb >>> ( devStates );
return 0;
}
Obviously, you have to include not only CUDA's includes and libraries, but CURAND kernel's as well (curand_kernel.h). Let's see what happens in the code, starting from main.
First, we create a curandState pointer, that will point to our global states.
Function setup_kernel will invoke curand_init(), that takes some seeds (I used the seconds since the Epoch, but it's free to the user) and sets the global states.
Generate() creates N kernel. Each kernel will have its own random floating point number, by using a local copy of its global state. In this case, the random number it's sampled from (0,1) with a uniform probability, but CURAND leaves a lot of possibilities.
Finally, we store the new seed into the global memory, and return.
9 commenti:
Very nice this blog!
visit our blog at http://uhooi.blogspot.com/
Hi - I am certainly glad to find this. cool job!
I need to generate random INTEGERS using curand. so how can we do that? also, what if I need to generate random integers within a specific range? are their any sample codes for that? appreciate Your help
I think that you have two possible ways to do it:
1) use curand_uniform to obtain a random floating point number from a uniform distribution, then map it to your integer interval. Pseudocdode:
float rnd_number = curand_uniform();
int rnd_integer_from_A_to_B = A + rnd_number * (B-A);
I use something similar for choosing reactions inside my biochemical simulations.
2) calling curand() should return "the next quasirandom 32-bit element"; I never tried it, I think you can do something like this:
int rnd_integer_from_A_to_B = A + curand() % (B-A);
let me know :)
Hey ... Thanks for the quick reply. I have tried both. And i think I would go ahead with the second option.
Thanks again.
Hello! Thanks for the tutorial!
Don't you have any performance issues generating the curandState vector? Here seems to be very low calling curand_init so many times, and I'm looking for a solution for this.
Thank you!!!
Hi Pedro,
you don't have to call curand_init() several times - just the first one. It's like the srand() function: you call it in order to initialize the seeds; after that, you fetch new pseudo-random numbers inside kernels by means of curand_uniform().
I hope I answered your question. Feel free to ask otherwise. :)
Wow, thx a lot about the example!
It helps me a lot about my gpu implementation~!
Wow, thx a lot about the example!
It helps me a lot about my gpu implementation~!
Post a Comment