Week 3: Linear Algebra Tools

Reading Recap

The Immersive Linear Algebra website has good introductory material on linear algebra topics related to computer graphics

Chapter 1: Introduction
Chapter 2: Vectors (Sections 2.1-2.5)
Chapter 6: The Matrix (6.1-6.4, 6.8)
Chapter 3: Dot Product (Sections 3.1-3.3, Ex 3.6., 3.7)

The Learn OpenGL website has a good summaries using glm and C++ syntax in an OpenGL context. Similar features are available in TWGL.

The term vector gets overused in computer science, but for graphics, it will be important to distinguish at times between something like a generic type vec3 and the geometric concept of vector. Theorem 2.1 in Chapter 2: Vectors covers the basic rules for vector, scalar operations.

With points \(P,Q\) and vector \(\vec{v}\), we allow the following operations

\(Q=P+\vec{v}\) (point displacement)
\(Q-P=\vec{v}\) (a variant of above, \(\vec{v}\) points from \(P\) to \(Q\))
\(1 \cdot P = P\)
\(0 \cdot P = \vec{0}\)

We will typically use the vec[234] GLSL types to store points, colors, and vectors. Even though GLSL will always support vec3 addition, it does not make sense geometrically to add two points.

Change of Basis/Frame

In 3D, a set of three linearly independent vectors, \(\vec{v_1},\vec{v_2}, \vec{v_3}\) form a basis allowing us to express any other vector \(\vec{w}\) as a linear combination of the basis, e.g., \(\vec{w}=a_1\vec{v_1}+a_2\vec{v_2}+a_3\vec{v_3}\). We can write this in matrix form as:

\[\vec{w} = \mathbf{a}^T \cdot \mathbf{v} = \begin{pmatrix} a_1 & a_2 & a_3 \\ \end{pmatrix} \cdot \begin{pmatrix} \vec{v_1} \\ \vec{v_2} \\ \vec{v_3} \\ \end{pmatrix}\]

When the underlying basis is understood, we can express \(\vec{w}\) by just its three coordinates \(\mathbf{a}^T = (a_1, a_2, a_3)\). We will use \(\mathbf{a}^T\) to express coordinates in row form, and the transpose,

\[\mathbf{a} = \begin{pmatrix} a_1 \\ a_2 \\ a_3 \end{pmatrix}\]

to express coordinates in column form.

By adding an origin point \(Q_0\) to our basis, we can define a frame in which we can assign coordinates to both vectors and points. In 3D, we do this by assigning a fourth coordinate that is set to 0 for vectors and 1 for points.

Our vector \(\vec{w}\) is now:

\[\vec{w} = \mathbf{a}^T \cdot \mathbf{v} = \begin{pmatrix} a_1 & a_2 & a_3 & 0\\ \end{pmatrix} \cdot \begin{pmatrix} \vec{v_1} \\ \vec{v_2} \\ \vec{v_3} \\ Q_0 \end{pmatrix}\]

While a point \(P=b_1\vec{v_1}+b_2\vec{v_2}+b_3\vec{v_3}+Q_0\) is defined as:

\[P = \mathbf{b}^T \cdot \mathbf{v} = \begin{pmatrix} b_1 & b_2 & b_3 & 1\\ \end{pmatrix} \cdot \begin{pmatrix} \vec{v_1} \\ \vec{v_2} \\ \vec{v_3} \\ Q_0 \end{pmatrix}\]

which again can be simplified to \(P=(b_1,b_2,b_3,1)\) if the frame is understood.

A frame is not unique. Any set of linearly independent vectors and and origin point can form a frame and coordinate system for measuring points and vectors. In graphics, this is a feature, not a bug, and it will often be convenient to work in one frame rather than another at various points. But since a frame is just a way of assigning coordinates to the underlying concept (a point location in space, or a vector magnitude and direction), there must be a way to change between two frames.

Let \((\vec{v_1},\vec{v_2}, \vec{v_3}, Q_0)\) be one frame, and let \((\vec{u_1},\vec{u_2}, \vec{u_3}, P_0)\) be another. Since each frame forms a basis, we can write each vector in the other frame as linear combination of the basis vectors, e.g.,

\(\vec{u_1}=m_{11}\vec{v_1}+m_{12}\vec{v_2}+m_{13}\vec{v_3}\)

\(\vec{u_2}=m_{21}\vec{v_1}+m_{22}\vec{v_2}+m_{23}\vec{v_3}\)

\(\vec{u_3}=m_{31}\vec{v_1}+m_{32}\vec{v_2}+m_{33}\vec{v_3}\)

Similarly we can express the origin of one frame in the coordinates of the other.

\(P_0=m_{41}\vec{v_1}+m_{42}\vec{v_2}+m_{43}\vec{v_3}+Q_0\)

In matrix form we can write this as

\[\mathbf{u} = \begin{pmatrix} \vec{u_1} \\ \vec{u_2} \\ \vec{u_3} \\ P_0 \end{pmatrix} = \begin{bmatrix} m_{11} & m_{12} & m_{13} & 0 \\ m_{21} & m_{22} & m_{23} & 0 \\ m_{31} & m_{32} & m_{33} & 0 \\ m_{41} & m_{42} & m_{43} & 1 \\ \end{bmatrix} \cdot \begin{pmatrix} \vec{v_1} \\ \vec{v_2} \\ \vec{v_3} \\ Q_0 \end{pmatrix} = \mathbf{M} \cdot \mathbf{v}\]

If a point/vector is expressed with coordinates \(\mathbf{a}^T=(a_1, a_2, a_3, a_4)\) in the \(\mathbf{v}\) frame and coordinates \(\mathbf{b}^T=(b_1, b_2, b_3, b_4)\) in \(\mathbf{u}\) frame, we can convert back and forth using the matrix \(\mathbf{M}^T\) and its inverse as follows:

\(\mathbf{a}^T\mathbf{v} =\mathbf{b}^T\mathbf{u} = \mathbf{b}^T\mathbf{M}\mathbf{v} \implies \mathbf{a}^T= \mathbf{b}^T\mathbf{M}\)

To convert to \(\mathbf{v}\) coordinates from \(\mathbf{u}\) coordinates , use

\(\mathbf{a} = \mathbf{M}^T\mathbf{b}\).

To convert the other way, use

\(\mathbf{b} = \mathbf{M^{-1}}^T\mathbf{a}\)

Exercise:

Suppose you have coordinates in the blue frame below and you want to convert to coordinates in the red frame. Try rewriting the basis vectors and origin of the blue frame vectors/origin in terms of the red frame to compute the matrix \(\mathbf{M}\) and its transpose.

coords

Now go the other way and compute the matrix to convert from red coordinates to blue coordinates. You can either invert the matrix, or swap the roles of \(\mathbf{u}\) and \(\mathbf{v}\) and repeat the first exercise.

Wednesday

The change of frame we did in class on Monday is similar to the one you will need for lab3. Additionally, you can use the twgl.m4 functions to build your projection matrix. m4.scaling([sx,sy,sz]) and m4.translation([tx,ty,tz]) will build and return scaling and translation matrices. The product of these two matrices can form a correct projection matrix if done in the right order and you can use m4.multiply to compute the product. There are other twgl functions that will apply a scale or translation directly to an existing matrix that you are welcome to use, but I encourage you to use the developer console to set a breakpoint and check the output of any computation to understand how they work. The twgl documentation is a bit sparse. There’s always the twgl source code.

Transforms

In 3D, we will manipulate geometry including points and vectors primarily using 4x4 matrix multiplication. Some common transformations are listed below.

Translation

Translation in three dimensions can be expressed as a matrix-vector (4x1 matrix) multiplication using 4D homogenous coordinates. What happens if you apply a translation to a geometric vector with this transform?

\[T(t_x, t_y, t_x) \cdot P = \begin{bmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \cdot \begin{pmatrix} p_x \\ p_y \\ p_z \\ 1 \\ \end{pmatrix} = \begin{pmatrix} p_x + t_x \\ p_y + t_y\\ p_z + t_z\\ 1 \\ \end{pmatrix}\]

Scale

\[S(s_x, s_y, s_x) \cdot P = \begin{bmatrix} s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & s_z & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \cdot \begin{pmatrix} p_x \\ p_y \\ p_z \\ 1 \\ \end{pmatrix} = \begin{pmatrix} s_x \cdot p_x \\ s_y \cdot p_y \\ s_z \cdot p_z \\ 1 \\ \end{pmatrix}\]

Rotation

There are several ways to do rotation. We will primarily focus on methods that rotate about one of the axes of the basis. For example, a rotation around the \(z\)-axis is helpful, even for 2D applications, as it only modifies the \(x\) and \(y\) coordinates.

\[R_z(\theta) \cdot P = \begin{bmatrix} \cos \theta & -\sin \theta & 0 & 0 \\ \sin \theta & \cos \theta & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \cdot \begin{pmatrix} p_x \\ p_y \\ p_z \\ 1 \\ \end{pmatrix} = \begin{pmatrix} p_x \cdot \cos \theta - p_y \cdot \sin \theta \\ p_x \cdot \sin \theta + p_y \cdot \cos \theta \\ p_z \\ 1 \\ \end{pmatrix}\]

Consider looking down the \(+z\)-axis. Is the rotation clockwise or counter-clockwise? How can you tell?

The matrices for \(R_x\) and \(R_y\) are given below.

\[R_x(\theta) = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & \cos \theta & -\sin \theta & 0 \\ 0 & \sin \theta & \cos \theta & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \ \ R_y (\theta) = \begin{bmatrix} \cos \theta & 0 & \sin \theta & 0 \\ 0 & 1 & 0 & 0 \\ -\sin \theta & 0 & \cos \theta & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}\]

Using matrix transforms

The good news for this course is you will almost never have to create one of these matrices by hand on either the CPU or the GPU. The link: TWGL.m4 module provides helper functions to create and manipulate 4x4 matrices and common graphics transforms.

What you should be able to do in this course is:

Decribe what effect a transform has on some geometry. Sketch a small scene before and after the transform.
Understand that the order in which you apply transforms matters. Matrix multiplication is not commutative, i.e., \(AB \neq BA\) in general for matrices \(A\) and \(B\)
Know what frame/basis you are working in and how to convert between frames/basis.

Moving to 3D

Given the work we have done with the OpenGL pipeline so far, moving to 3D in some ways is relatively simple: we just add an extra coordinate. Creating buffers, setting up shader programs, and setting uniforms is the same in 3D as it is in 2D. I posted a small rotating sphere demo online and in the demos repo.

In 3D, the geometry tends to get more complex. To draw a sphere, we still need to create an array of points and render the sphere as a set of triangles, but now we will need many more triangles. Instead of doing this ourselves which would be tedious and prone to error, twgl (and other 3rd party tools like three.js) can generate geometry for common primitives.

In our demo, we are divide the sphere into 20 longitudinal strips and 20 latitudinal stacks to create 400 quads. Each of these is divided into two triangles. TWGL will automatically create the buffer of points for us with the label position. Additionally, TWGL will create texture coordinates for each vertex so that we can wrap an image around the geometry, as is shown with the Earth texuture. TWGL will also create normal vectors at each vertex, which are unit length vectors perpendicular to the surface. While not used in this demo, normal vectors will be important once we talk about lighting.

Indexed drawing

For complex geometries, it is sometimes difficult to express the shape in a compact array that can be called using gl.drawArrays with a gl.TRIANGLE_STRIP or gl.TRIANGLE_FAN. And repeating vertices to used gl.TRIANGLES could dramatically increase the size of our buffers.

OpenGL supports another drawing method called gl.drawElements where we specify each vertex once, but provide a separate array of indices that index into our vertex array to draw the geometry. We can repeat an index multiple times to refer to the position, normal, and texture coordinate of a particular vertex. For shapes like a sphere where a single vertex is part of multiple triangles, repeating a single integer index is more space efficient than repeating the position, normal, and texture.

The following line in our demo

gl.drawElements(gl.TRIANGLES, bufferInfo.numElements, gl.UNSIGNED_SHORT, 0);

instructs WebGL2 to draw triangles, using all the element indices in the bufferInfo object. Each index is an gl.UNSIGNED_SHORT type. The final parameter is an offset into the index array. A value of 0 means just start at the beginning.

Projection and Model Transforms

Much like lab3, we don’t want to have to model everything in clip coordinates. Our 3D demo uses an orthographic projection matrix m4.ortho to define a rectangular space that is convenient for us. I just picked 10x10x10 as a reasonable default. I also scale the x-direction by the aspect ratio of the canvas so the scale looks uniform in the final image. Without the correction, spheres look squished depending on the canvas size.

With an orthographic projection matrix, we can model our scene in more convenient coordinates, and then project in the vertex shader down to clip space.

It is also common in 3D to apply individual model transforms to objects to position them in the world. In the demo, we apply a time varying rotation matrix to spin the globe, but we will explore model transforms more on Friday and lab4.

Friday

Using model transforms to compose a scene
Dat.gui for tweaking parameters
Z-buffer for object occlusion

Model Transforms

On Wednesday, we introduces twgl primitives to construct 3D geometric models and we used the Orthographic projection to map our world into clip space. Twgl and other third party libraries typically construct the primitives in a standard object frame, e.g., for the sphere, the coordinates are chosen such that the center of the sphere is always at the origin. The primitive objects are then transformed to different locations in the world through model transforms. These are typically compositions of the translations, rotations, and scales we discussed on Wednesdays. We refer to the matrix that transforms the object from the standard object coordinate frame to the world frame as the model matrix. It is usually the first matrix to be applied to the geometry in the vertex shader. In our current demo, we apply the model matrix first, and then the orthographic projection matrix to transform the world coordinates to clip space.

\[\mathbf{v} = \mathbf{PMp}\]

where \(\mathbf{v}\) is the output vertex geometry (in clip coordinates), \(\mathbf{P}\) is the projection matrix, \(\mathbf{M}\) is the model matrix, and \(\mathbf{p}\) is the original vertex geometry of the sphere in object coordinates. Note when we say the model matrix is applied first, we mean it is closest to the \(\mathbf{p}\). In reading from left to right, we see the projection matrix first, but this matrix is applied to the result of the multiplication of \(\mathbf{M}\) and \(\mathbf{p}\), so its effects happen later.

This order of matrix operations is important. Consider a rotation and translation matrix. In the rotating sphere demo, I have modified the code to add some slider controls. Try setting the speed to 0 for a moment and adjusting the rotate and translate amounts. The translation slider translates the sphere in the current \(\vec{x}\) direction by building a translation matrix \(\mathbf{T}\). The rotation slider rotates the sphere around the current \(\vec{y}\) direction by building a rotataion matrix \(\mathbf{R}\). If the rotate_first check box is checked, the model matrix is \(\mathbf{M} = \mathbf{TR}\). Otherwise, the demo computes \(\mathbf{M} = \mathbf{RT}\). Again, the notation of first means which transform is applied first to the object geometry.

Try adjusting the sliders to get a feel for how the \(\mathbf{TR}\) and \(\mathbf{RT}\) transforms are different. You can check the mini_earth box to put a small static globe at the origin that is unaffected by either transform. You can also adjust the speed while tweaking the translate and rotate_first fields. This will animate the rotation amount. With a speed of 0, you can set a fixed rotation in degrees.

Dat.gui

The control bar is another 3rd party library called dat.gui like twgl and the stats bar. I have added it to the cs40lib repo and you should automatically have access to it on the CS network. If you are working on the labs at home without a CS connection, you will need to pull to update your cs40lib repo.

You can import the GUI module in your application using

import {GUI} from "./lib/dat/dat.gui.module.js";

Then make a new GUI somewhere in your code.

const gui = new GUI();

The gui object can be connected to properties of an object and then changes to the sliders in the GUI will automatically change the object properties. See setupControls() for a full example.

function setupControls(){
  controls.rotate = 20;
  controls.rotate_first = false;
  ...
  gui.add(controls,"rotate").min(0).max(360);
  gui.add(controls,"rotate_first");
  ...
}

Now you can use the values of the controls properties in your code to adjust settings as needed, and when you change the sliders in the GUI, your application can respond to the new control values.

Occlusion testing and z-buffers

For 3D modeling, we still project our final output image into a 2D screen. During this final conversion, we only retain color information of what is visible, losing information about where the original objects were in 3D space. To make the scenes look realistic, we must find ways to convey the depth of the scene in a 2D image. We will explore multiple ways of doing this. Perspective transformations and lighting will be important upcoming steps in the next few weeks. But another key feature that is already implemented for us in the OpenGL pipeline is occlusion detection. If we enable the mini Earth in our demo and allow our larger model to rotate around the center of the scene, we note that sometimes the larger Earth is in front, occluding the mini Earth, while at other times, the larger Earth is behind its smaller version and partially occluded.

Because both the vertex shader and fragment shader are highly parallel, we don’t know when a particular vertex or fragment will be processed in the pipeline, and we cannot always order the geometry in a correct back to front rendering order. To detect which object we should draw when multiple objects overlap the same pixel, we use an auxiliary depth or z-buffer to keep track of track of the depth of the the most recent fragment written to the color buffer at a particular pixel location.

When it time to process a fragment, we assign its outcolor, but then check the z-buffer to determine if the depth of this fragment is smaller (closer to the viewer) than the most recently written fragment at this location. If our new fragment is closer, it will be visible, and we update both the color buffer with and the z-buffer with the information of the new fragment. If the new fragment is not closer however, it is occluded by another fragment, and we can safely discard this fragment without updating the buffer.

These depth checks assume our objects are not semi-transparent. For semi-transparent objects, we must be more careful in how we process color fragments. This is an advanced topic.