There were two options to get going: OpenCL or CUDA. Although OpenCL is a cross-platform solution for the project, I decided to go with CUDA since it seems that CUDA has a better support. I will run every program on my Geforce GTX 960M.
I will be letting you know the execution time to produce every image. I only consider kernel execution time on GPU which does basically the whole thing. Therefore, memory allocations, copying host memory to GPU's global memory, etc. will not be considered in calculated time.
I designed two different modes to run the ray tracer. First one is the photo mode. It basically takes a photo, saves it and returns. On the other hand, video mode presents an interactive real-time ray tracer. It can produce up to 500 frames per second for simple scenes. However, since I haven't implemented an accelerating structure yet, the ray tracer doesn't give a real-time performance in complex scenes.
Enough talking, let's show the outputs and respective execution times.
|output of simple.xml|
kernel execution time: 2.7 milliseconds
|output of simple_shading.xml|
kernel execution time: 3.42 milliseconds
|output of bunny.xml|
kernel execution time: 351 milliseconds
Let's list some lessons learned from this assignment:
- Choose the number of threads per block wisely. Generally 8x8 or 16x16 gives the best performance.
- Be careful when you copy the values from host memory to device memory if you are copying an instance of a class which involves virtual methods. When you try to copy the the instances of this class, vtable pointer is also copied along the member variables. The problem is that this vtable pointer points to memory locations reside in host memory. However, when you try to reach to these addresses in device side, the behaviour is undefined.
- "Kernel execution time limit" took my whole day. It could have been really easy to solve it however I did not implement a structure to check runtime cuda call errors. If your default display graphics card and the card on which your cuda code executes are the same, you are likely to have this issue. If the GPU cannot finish executing the kernel in 5-6 seconds,(I do not know the exact limit) it simply ignores the kernel execution and stops. The workaround to this problem can be found here.
- As I mentioned in the previous one, write a macro or whatever you wish to check errors related to cuda calls.