I have Nvidia GEFORCE with CUDA graphics card with the 410M processor. I suppose this configuration is the oldest processor in the NVidia family of graphics cards. I feel I am lucky as I had this processor shipped in my laptop by sheer serendipity. I happen to know a little bit of game programming so would like to try my luck at developing with CUDA. It is a parallel computing platform and programming model. When Nvidia introduced CUDA it stood for compute unified device architecture.
I researched a bit on installing Nvidia CUDA and found that first Nvidia drivers need to be installed(which were'nt there). Then I installed the CUDA 5.5 toolkit. I chose the 5.5 version because my Ubuntu 14.04 LTS version with Nvidia GEFORCE 410M processor would only support that version of toolkit.
After much helter-skelter I found that you could not do programming with CUDA in normal eclipse. You needed the Nsight Eclipse. I downloaded all of these and finally I was ready to run a CUDA program. Found that C programs in CUDA have .cu extension.
Went to file-> New Project-> CUDA C/C++ Project
Gave a name to the project.
In the next screen let the defaults as it is:
In the next screen clicked finish.
Then right clicked on the project name->New->Source File
Then in the interface give a name to the file with .cu extension. You may choose any template of your choice and as I am new to CUDA I just let the default as it is(Bit-reverse demo template). With this basic experience in running a bit reverse program whose output looks like
Input value: 0, device output: 0, host output: 0
Input value: 1, device output: 128, host output: 128
Input value: 2, device output: 64, host output: 64
Input value: 3, device output: 192, host output: 192
Input value: 4, device output: 32, host output: 32
Input value: 5, device output: 160, host output: 160
Input value: 6, device output: 96, host output: 96
Input value: 7, device output: 224, host output: 224
Input value: 8, device output: 16, host output: 16
Input value: 9, device output: 144, host output: 144
Input value: 10, device output: 80, host output: 80
Input value: 11, device output: 208, host output: 208
Input value: 12, device output: 48, host output: 48
Input value: 13, device output: 176, host output: 176
Input value: 14, device output: 112, host output: 112
.
Input value: 1, device output: 128, host output: 128
Input value: 2, device output: 64, host output: 64
Input value: 3, device output: 192, host output: 192
Input value: 4, device output: 32, host output: 32
Input value: 5, device output: 160, host output: 160
Input value: 6, device output: 96, host output: 96
Input value: 7, device output: 224, host output: 224
Input value: 8, device output: 16, host output: 16
Input value: 9, device output: 144, host output: 144
Input value: 10, device output: 80, host output: 80
Input value: 11, device output: 208, host output: 208
Input value: 12, device output: 48, host output: 48
Input value: 13, device output: 176, host output: 176
Input value: 14, device output: 112, host output: 112
.
.
(so on upto 255)
I was a bit encouraged to try and develop parallel fast games using CUDA.
I was a bit encouraged to try and develop parallel fast games using CUDA.
Wondering what this device and host mean. To hazard a guess the device means the Graphics card and the host means the Main CPU. Feeling too tired to try and understand the code. I hope you understood what the program does.
If interested you can download the program from here. Thank You :-)
Update:
Something says me to develop this project further. I am looking to parallelize some of the games that make use of the CPU so that they run faster. For the present my interest is in rendering fast the shadow polygons in the scene, I heard this requires significant computational overhead and existing algorithms are not good enough.
Update:
Something says me to develop this project further. I am looking to parallelize some of the games that make use of the CPU so that they run faster. For the present my interest is in rendering fast the shadow polygons in the scene, I heard this requires significant computational overhead and existing algorithms are not good enough.
No comments:
Post a Comment