CUDA

NVIDIA GPGPU Programming

A note on raytracing

Can you raytrace using shaders? Yes! Delve into the weird world of ShaderToy.

Setup

I have updated code on the examples repo on github. Start by checking the status of your personal examples repo using git status and committing any local changes. Refer to git status help page for instructions, particularly on files that have been modified.

Next, fetch upstream changes. If you have committed local changes, merge conflicts should be minimal

[~]$ cd ~/cs40/examples
[examples]$ git fetch upstream
[examples]$ git merge upstream/master
[examples]$ git push

Open up a terminal and navigate to the w08-cuda-pt1 directory.

In a second terminal, navigate to the build folder and run make -j8 to get the w08-cuda-pt1 folder.

[~]$ cd ~/cs40/examples/build/
[build]$ make -j8
[build]$ cd w08-cuda-pt1
[w08-cuda-pt1]$ 

CUDA overview

CUDA is a framework for running highly parallel compute jobs on NVIDIA GPUs. From our OpenGL experience, you should already be familiar with the idea that we can copy data from the CPU to the GPU using VBOs and textures. We can run programs (shaders) that we define on the GPUs. These shaders transparently run in parallel on individual vertices or fragment. The final output from the fragment shader appears as colors in a screen buffer in our OpenGL context.

In CUDA, we make the notions of data I/O and computation more general. We will copy data between the CPU and GPU using cudaMalloc and cudaMemcpy. CUDA uses the terms host and device to refer to the CPU and GPU, respectively.

To run a program on the GPU (device), we write a special function called a kernel. A kernel is just a normal C function with the following restrictions/additions

__global__ is not a valid C keyword or token and while CUDA looks a lot like C, it has some extra features and keywords. But we should be familiar with this. GLSL is a C-like shading language for OpenGL that adds some extra keywords/features. Qt also adds some extensions to C++ to handle signals and slots. And just as Qt code is compiled by a Meta Object Compiler (MOC) and GLSL shaders are compiled/link by special OpenGL functions, CUDA is compiled with a special compiler, nvcc in /usr/local/cuda/bin.

Simple Kernel

The file simple_kernel_params.cu gives an overview of the mechanics of a CUDA application. This is a silly example. It uses the GPU to add two numbers a and b and stores the result in a GPU buffer c. There is no reason this needs to be a GPU application, but we are using this simple example to illustrate some features of CUDA.

First, since kernels must have a void return type, how can we ever get output from the GPU? We can use a pointer that points to a buffer of memory and write our output to that buffer. We have to be a little careful though, since there are two memory spaces: CPU/host memory and GPU/device memory. Since the kernel runs on the GPU, this pointer should point to device memory. In fact, any pointer passed to a kernel should be a pointer to device memory.

We can allocate memory on the GPU using the function cudaMalloc. We pass cudaMalloc an address of a pointer dev_c and the requested size. The function then allocates space on the GPU and writes the value of the GPU pointer to dev_c. We now have a GPU address stored in a pointer variable on the CPU. You can pass this pointer value around, including passing it to a kernel, but you cannot dereference this point on the CPU because it refers to a GPU address.

Calling a kernel

cudaMemcpy

CUDA demos

NVIDIA releases a bunch of demos of CUDA applications. You can find and run these demos in
cd /usr/local/cuda-7.5-samples/NVIDIA_CUDA-7.5_Samples/bin/x86_64/linux/release