Wednesday, 26 March 2008

Hello World with CUDA

Writing Hello World in CUDA is bit difficult ( CUDA does not support strings)

But the following is a vector addition program which may be a good starting point

Simple code to add two vectors.(Blue colour)

#include "stdio.h"

__global__ void add_arrays_gpu( float *in1, float *in2, float *out, int Ntot)
int idx=blockIdx.x*blockDim.x+threadIdx.x;
if ( idx

int main()
/* pointers to host memory */
float *a, *b, *c;
/* pointers to device memory */
float *a_d, *b_d, *c_d;
int N=18;
int i;

/* Allocate arrays a, b and c on host*/
a = (float*) malloc(N*sizeof(float));
b = (float*) malloc(N*sizeof(float));
c = (float*) malloc(N*sizeof(float));

/* Allocate arrays a_d, b_d and c_d on device*/
cudaMalloc ((void **) &a_d, sizeof(float)*N);
cudaMalloc ((void **) &b_d, sizeof(float)*N);
cudaMalloc ((void **) &c_d, sizeof(float)*N);

/* Initialize arrays a and b */
for (i=0; i
a[i]= (float) i;
b[i]=-(float) i;

/* Copy data from host memory to device memory */
cudaMemcpy(a_d, a, sizeof(float)*N, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, b, sizeof(float)*N, cudaMemcpyHostToDevice);

/* Compute the execution configuration */
int block_size=8;
dim3 dimBlock(block_size);
dim3 dimGrid ( (N/dimBlock.x) + (!(N%dimBlock.x)?0:1) );

/* Add arrays a and b, store result in c */
add_arrays_gpu<<>>(a_d, b_d, c_d, N);

/* Copy data from deveice memory to host memory */
cudaMemcpy(c, c_d, sizeof(float)*N, cudaMemcpyDeviceToHost);

/* Print c */
for (i=0; i
printf(" c[%d]=%f\n",i,c[i]);

/* Free the memory */
free(a); free(b); free(c);
cudaFree(a_d); cudaFree(b_d);cudaFree(c_d)


Running the Code

Copy the code in a file
Compile it with nvcc: nvcc -o add_vector
Run it: ./add_vector

If you don't have a Cuda capable GPU, compile it in emulation mode:
nvcc -deviceemu -o add_vector_emu
Run it: ./add_vector_emu


  1. Thank you for the prompt reply after long time. But it is not long for me, since i could not install till now.
    But i asked some experts in that area, they were telling there are difficulties in installing the CUDA in different OS. I have linux fedora. In this OS, it is very difficult to Install that.
    still i do not know the problem
    Thank you for the reply
    Best Wishes :)

  2. have you traied the following link

  3. i got the following error on executing the code in device emulation mode:

    error: expected an expression at this line
    add_arrays_gpu<< >>(a_d, b_d, c_d, N);

  4. Is this parallel programming?

  5. you can say that.


  6. I don't use VS2008 , as i program on linux, but refer the following linnk, if that helps.


  7. where to copy the code to compile it

  8. How do we compile CUDA fortran code? I believe that the nvcc is just for the C code. Is PGI Fortran the only compiler available ?

  9. I tried to compile and got the message:

    Visual Studio configuration file 'vsvars32.bat' could not be found for installation at './../../..'

    Does anybody know how to proceed in this case?

  10. hi
    i've installed cuda sdk & toolkit on ubuntu but still dont know how to compile program can u provide me a complete description about where to right programs & how to compile them how to use cuda in emulation mode thanking u

  11. Extremely helpful. Your strength is exactly what is most needed and lacking on the net: Objectivity.

    The example was simple and complete enough for an introduction and the execution instructions were as simple as possible.

    Most users on the net are desperate for attention and always write far more than needed, making useless tutorials. Most content on the net is totally useless actually, but certainly not yours!

  12. Thanks for the complements.

  13. Hi, this link seems doesn't work now. Where I can find the copy?


  14. Couple errors in the code:

    for loop should read:
    for (i=0; i>>(a_d, b_d, c_d, N);

    And you need a ';' at the end of the last cudaFree(c_d);

    I think that is it.