A GPU-Based Path Tracer: April 2017

Sunday, April 30, 2017

Assignment 6&7: Texture Mapping, Procedural Textures and Bump Mapping

Textures add an important portion of realism to ray tracers. Using just plain colors is not enough to create real world scenes and this is why it is almost a must for every ray tracer.

For these two assigments, I implemented texture mapping, perlin noise textures and bump mapping. CUDA helped me, it does once in a while, for texture operations such as fetching, filtering and setting up many sampling parameters. Figuring out which function to use and how to use it was a bit painful but now all is clear and I will talk about them in the "Lessons learned" section. Also, I implemented the improved perlin noise to produce procedural textures.

Bump mapping is relatively easy when you add the texturing functionality to your ray tracer. However, there is one big problem when dealing with the tangent vectors and the surface normal. How are you going to transform them if you are using instancing? Transforming the normal is well-known but what about others? Of course, we will not transform the tangent vectors in the way we transform the normals. Say, you figured out that you should use the transformation matrix M but not the inverse-transpose of M. However, it will work unless you have a transformation matrix whose determinant is negative. If your transformation matrix has a negative determinant then everything changes. Thanks to one of my friend who gave me this information, I was able to solve the problem by taking tangent vectors from object space to world space by transforming them with M and after that by multiplying one of the tangent vectors by the sign of the determinant of M.

Let us see the outputs of this week.

output of simple_texture.xml
kernel execution time: 9.7 milliseconds

output of the skybox.xml
kernel execution time: 923 milliseconds

output of ellipsoids_texture.xml
kernel execution time: 11.4 milliseconds

output of sphere_texture_blend_bilinear.xml
kernel execution time: 10.7 milliseconds

output of sphere_texture_replace_nearest.xml
kernel execution time: 10.8 milliseconds

output of sphere_texture_replace_bilinear.xml
kernel execution time: 10.9 milliseconds

output of killeroo_diffuse_specular_texture.xml
kernel execution time: 492 milliseconds

output of perlin_types.xml
kernel execution time: 14.1 milliseconds

output of bump_mapping_basic.xml
kernel execution time: 350 milliseconds

output of bump_mapping_transformed.xml
kernel execution time: 335 milliseconds

output of killeroo_bump_walls.xml
kernel execution time: 515 milliseconds

output of sphere_bump_nobump.xml
kernel execution time: 192 milliseconds

Lessons learned:

I stored the random number sequence for the perlin noise in the constant memory. Since the numbers are constant during the execution, constant memory is the fast and appropriate solution.
Texturing with CUDA is really simple. The only confusing part is to use which function and how. First of all, I recommend you to implement a texture manager class which manages all the textures loaded for your scene. Your scene file may use the same texture more than once and you might create space for the same image for many times. Following code snippet summarizes how to create a texture in CUDA.

       
cudaArray* cuda_array = nullptr;

//1-Create and fill a channel description.
cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc(8, 8, 8, 8, cudaChannelFormatKindUnsigned); //RGBA

HANDLE_ERROR(cudaMallocArray(&cuda_array, &channel_desc, image_width, image_height));
HANDLE_ERROR(cudaMemcpyToArray(cuda_array, 0, 0, bits, //"bits" is the pointer to your image on the host memory
             image_width * image_height * 4, cudaMemcpyHostToDevice)); 
        
//2-Create and fill a resource description.
cudaResourceDesc res_desc;
memset(&res_desc, 0, sizeof(res_desc));
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = cuda_array;

//3-Create and fill a texture description.
cudaTextureDesc tex_desc;
memset(&tex_desc, 0, sizeof(tex_desc));
tex_desc.addressMode[0] = cudaAddressModeClamp;
tex_desc.addressMode[1] = cudaAddressModeClamp;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeNormalizedFloat;
tex_desc.normalizedCoords = 1;

//4-Create the texture object
cudaTextureObject_t texture = 0;
HANDLE_ERROR(cudaCreateTextureObject(&texture, &res_desc, &tex_desc, nullptr));

..
..
..
..


//Fetch a pixel.
float4 pixel = tex2D<float4>(texture , u, v);

Beware that we do not bind any texture before using it since we are using texture objects but not texture references. To understand the difference and which parameter does what, you should see the documentation.

Tuesday, April 18, 2017

Assignment 5: Distributed Ray Tracing

Although we have rendered some nice pictures so far, we didn't concentrate on making them more realistic. By looking at them closely, we can easily see hard shadows and jaggy edges which are the fundamental problem of computer generated images. In this assignment, we concentrate more on this so that we can have softer shadows and less jaggy edges by means of multisampling. Of course there are ways to post-process the final images to have more realistic ones but that would be another topic of discussion. Unfortunately, realism comes at a price in our case, regardless to say it is worth it, since we are now sending more and more rays for a pixel.

Taking more than one sample per pixel does not only help to implement multisampling but it also makes it possible to have area lights, depth of field and some other nice things without a further cost. Our starting point is to choose some random points around a pixel center and consider all the radiance values computed for the rays shot from these points when calculating the final value of this particular pixel. Many strategies can be adopted for choosing these random points and combining radiance values we get. Easiest one is to select all the points just randomly. However, "just randomly" creates quite noisy images. Instead of this, we can adopt jittered sampling, multi-jittered sampling, n-rooks sampling and many more. I implemented jittered sampling and the results are quite satisfying. For the "combining" part, I basically take the average of the radiance values resulted from these rays.

Once you calculated the jittered sample for a pixel, you can use this sample for the area light and the depth of field part as well. For example, say you are going to take 36 samples for your scene. That means you will take 36 jittered sample points around the pixel, 36 different sample points on the area light and 36 different sample points on the aperture of the thin lens camera. Once you take one of this sample, you can use it for the others too. This way, you will have more uniformly distributed samples on each of the planes you are sampling. Of course, this may not apply if you want to take more samples for area lights but not for multisampling or for other combinations that you might want different number of samples.

Ray Tracing from the Ground Up explains these concepts clearly. Also, you can see this to pick a random point on a disk, which you are going to need for disk shaped area lights and for depth of field effect.

Apart from these subjects, I implemented what I've explained in the "lessons learned" part of the previous post. I reduced the rendering time from 1.34 seconds to 0.47 seconds for the image below.

previous approach, 32 samples

mentioned approach, 169 samples

mentioned approach, 32 samples

However, not unexpectedly, the resulting images are a bit noisy. Considering that we are going to take more and more samples when we deal with Monte Carlo method, this approach will be the right choice for sure.

Lastly, let us see what we rendered this week.

output of glass_plates_point.xml
kernel execution time: 1.34 seconds

output of glass_plates_area.xml
kernel execution time: 1.37 seconds

output of dragon_spot_light.xml
kernel execution time: 26.7 milliseconds

output of dragon_spot_light_msaa.xml
kernel execution time: 2.07 seconds

output of spheres_dof.xml
kernel execution time: 331 milliseconds

output of metal_plates_area.xml
kernel execution time: 1.07 seconds

The dragon under an area light

Sunday, April 9, 2017

Assignment 4: Reflections and Refractions

Ray tracing is good at simulating refraction and reflection of light by its nature. One of the first things we have to consider to simulate reflective and refractive materials is that we have to shoot rays beside the primary and shadow rays. After primary rays, if we detect any refractive or reflective surface, we should generate new rays based on the physics of light.

A perfect specular surface reflects the incoming light in a certain direction. Although perfect specular surfaces are hard to find in nature, not sure if it is impossible, they are implemented in every ray tracer since they are the starting point for simulating other reflective surfaces. On the other hand, when the light interacts with a refractive surface, some of the photons are reflected while some others are refracted depending on the incident angle of light. This effect can be easily seen from the picture of a lake below.

As you can see, in the pixels near the right-bottom corner of the image, we can easily see the transparent nature of water. The reason is that the incident angle is smaller. However, as the view direction is getting to be perpendicular to the surface normal, incident angle is getting larger and therefore photons are getting mostly reflected. Fresnel equations explain this behaviour and based on these equations, we can compute which proportion of photons will be reflected or refracted. Furthermore, Schlick’s approximation is a well-known approximation of the Fresnel equations, which can be used to decrease the computation time. Also, Snell's law helps us to compute the direction of a refracted light. Finally, by means of the Beer's law, we can compute the attenuation of the light traveling through the medium. This document clearly explains aforementioned equations and laws except Beer's law.

Let us see the scenes we should render for this assignment.

output of cornellbox_glass.xml
kernel execution time: 31.1 milliseconds

output of horse_and_glass_mug.xml
kernel execution time: 571 milliseconds

output of glass_plates.xml
kernel execution time: 55.8 milliseconds

output of killeroo_glass.xml
kernel execution time: 531 milliseconds

output of killeroo_half_mirror.xml
kernel execution time: 82.5 milliseconds

output of killeroo_mirror.xml
kernel execution time: 82.5 milliseconds

Lessons learned:

Rendering times are higly increased for the scenes which include refractive materials. Of course this is an expected outcome, however, the biggest reason for slowdown is the usage of stack which is used to recursively trace all the rays generated by these materials. Even without recursion, that is, just defining a stack of 2 kb in every thread and not touching it, my ray tracer slows down by 2x. A workaround to this problem is as follows: instead of generating two rays after light-refractive surface interaction, we generate one ray but we choose whether to refract or reflect the ray by means of Fresnel equations or Schlick's approximation and a uniform random number generator. The problem is that if we take just one sample for each pixel, the output might look incorrect. Therefore, we should take as many samples as possible for each pixel since the output converges to the expected output with infinitely many samples. For these reasons, I postponed the implementation of this approach to the next assignment where I will implement multisampling. After implementing this approach, every ray-surface interaction will be able to generate at most one ray so that I get rid of the stack.