Friday, 27 May 2011


I recently had to develop a kind of "widget" that interrogates a remote server by means of an asynchronous javascript call. The response was a javascript object defined with JSON syntax. Question was: how do I process it? I found many tutorials that use JSONP techniques (weird yet powerful for cross-site execution) but I don't have control on the remote service. Others suggest an external framework (like jQuery) but I don't like it, I want my own wheel.

So, here come my snippets. First, you make your async call:

xmlhttp = window.XMLHttpRequest?
   new XMLHttpRequest() : new ActiveXObject("Microsoft.XMLHTTP");
xmlhttp.onreadystatechange = parse_function;"GET", service_url );

Then, you handle the response (inside parse_function):

if ( xmlhttp.readyState == 4 && xmlhttp.status == 200) {
   json_data = eval( "(" + xmlhttp.responseText + ")" );

This way you get an object filled with usable stuff (json_data). Notice js' eval function and the brackets - those do the trick.

Saturday, 21 May 2011

CUDA, random numbers inside kernels

Evolutive algorithms have an intrinsic stochastic nature, therefore they make large use of random numbers generators, for instance the C/C++ rand() function. Anyway, when GPGPU comes into play, using random numbers could be tricky.

The naive solution of create and pour them into GPU's global memory is a bad idea, because of the huge bandwidth that would be wasted. Therefore, it takes a device-bound generator. Nvidia's CURAND library makes it easy to generate random numbers directly inside CUDA kernels.

A bit of theory...

Obviously, it is impossible to generate really random numbers on a deterministic machine. Any random function just applies some kind of transformation on another number, determining a succession that looks like a random one. Anyway, two successions starting from the same number (the seed) will be completely identical. If our kernels start from the same seed, regardless the algorithm, they will produce the same results. They must be different. Moreover, you have to store the previous numbers (let's call them global states) in order to produce the next ones during the following execution of each kernel. Fortunately, CUDA allows the storage and update of our partials directly inside GPU's memory.

...and some sample code
I post the example code - comments will follow.

__global__ void setup_kernel ( curandState * state, unsigned long seed )
    int id = threadIdx.x;
    curand_init ( seed, id, 0, &state[id] );

__global__ void generate( curandState* globalState ) 
    int ind = threadIdx.x;
    curandState localState = globalState[ind];
    float RANDOM = curand_uniform( &localState );
    globalState[ind] = localState; 

int main( int argc, char** argv) 
    dim3 tpb(N,1,1);
    curandState* devStates;
    cudaMalloc ( &devStates, N*sizeof( curandState ) );
    // setup seeds
    setup_kernel <<< 1, tpb >>> ( devStates, time(NULL) );

    // generate random numbers
    generate <<< 1, tpb >>> ( devStates );

    return 0;

Obviously, you have to include not only CUDA's includes and libraries, but CURAND kernel's as well (curand_kernel.h). Let's see what happens in the code, starting from main.

First, we create a curandState pointer, that will point to our global states.

Kernel setup_kernel will invoke curand_init(), that takes some seeds (I used the seconds since the Epoch, but it's free to the user) and sets the global states.

Generate() kernel creates N threads. Each thread will have its own random floating point number, by using a local copy of its global state. In this case, the random number it's sampled from (0,1) with a uniform probability, but CURAND gives the possibility to sample from a standard normal distribution as well.

Finally, we store the new seed into the global memory, and return.

Friday, 6 May 2011

On the bioinformatics or: computer science's pillars

One of the most interesting facets of bioinformatics is that pushes computer science towards its limit. When you deal with huge data sets, a poorly implemented algorithm or the choice of the wrong data structure leads to inevitable failure.

Screenshot above shows some information about the costs of a piece of software I'm developing for prof. Besozzi, that executes a slight variation of Aho-Corasick on saccharomyces cerevisiae's genome, looking for a bunch of relevant nucleotides sequences. The data-set isn't impressive, just 36MB, and loading it into main memory takes about 10MB. The remaining 128 megabytes are due to the tables produced by algorithm and the huge vectors that store the results (>7 million of matches).

This being the ratio, one can easily foresee that bigger data sets (eg.: the human genome) could easily get out of hand, requiring more resources than one can afford! I'm more than satisfied in terms of time complexity (it's a linear algorithm, therefore optimal), but some further optimization could (has) to be done on the spatial side.

Monday, 2 May 2011

Fascinating complexity

Here we go, another little step towards Master Degree! "Complex systems" is achieved and with full score, thanks to a crowd simulation based on Craig Reynolds' BOIDS and rendered by my own 3D engine "PLT".

I love emerging models - as one can easily foresee by reading the name of this blog - and BOIDS is no exception. A few rules and parameters, two constraints and a bit of randomness and we have a hell of algorithm, almost chaotic and showing a lot of unexpected and non-designed behaviours: formation of queues, spontaneous groups, and so on. Michele and I did a validation study, basing on the previous works of Crystals' Project, a joint effort in crowds simulation. Our model worked well, mathing the expected timings and behaviours, at least with reasonable agents' density; being BOIDS a simple reactive multi-agent system, our pedestrian do not embedded sufficient "intelligence" and can easily fall in many subtle cul-de-sac. It's a good result, anyway, and leaves a lot of space for further investigation.