Week 4: Dot and Cross Product/LookAt

Reading

The Immersive Linear Algebra website has good introductory material on linear algebra topics related to computer graphics

Chapter 3: Dot Product (Sections 3.1-3.3, Ex 3.6, 3.7)
Chapter 4: Vector Product (Sections 4.1-4.4, Ex 4.2, 4.5)
Chapter 1: Introduction
Chapter 2: Vectors (Sections 2.1-2.5)
Chapter 6: The Matrix (6.1-6.4, 6.8)

Some LookAt references:

gluLookAt() Song Ho Ahn (안성호) good math and animations.
LearnOpenGL LookAt Notes and camera controls. Uses glm and variable names a bit confusing, but otherwise good.

Wednesday

Demo gradescope quiz posted - try it out. The questions are fake, but the setup is real. You have 60 minutes, but I’m guessing it should take no more than 10.
The real quiz will not cover this week’s topics of dot products, cross products, but will cover topics prior to that. You should be familiar with basic geometric concepts of
- Point
- Vector
- Triangle/Square shapes
- How to combine points/vectors
- How to apply basic transforms (primarily translate and scale, simple rotations). You do not need to know what the full 4x4 matrices look like for e.g. RotateZ, but you should know what effect something like RotateZ would have on a point/vector.
- The basics of shaders
  - difference between fragment/vertex shader
  - in/out/uniform qualifiers
  - where do shaders get input, where do they send output
- An overview of the OpenGL pipeline. What are the primary steps in drawing a single triangle?
- Some basic frames: clip space, screen space, grid space.
The quiz will be open all day Friday. Once you start, you will have 60 minutes to complete the quiz, though the cushion is to allow for technical problems. I aim to design the quiz so that prepared students could complete it in 25 minutes.
Today’s Topics:
- Dot product wrap up
- Cross product definition/applications
- View matrix

Dot Product

The dot product or scalar product is an operation between two vectors that returns a scalar or float quantity. In graphics, we use the dot product primarily for it’s geometric intepretation.

\[\vec{u}\cdot\vec{v} = \|\vec{u}\|\|\vec{v}\| \cos(\theta)\]

The notation \(\|\vec{u}\|\) means the length or norm of \(\vec{u}\).

Orthonormal Basis

The dot product is well defined for any vectors regardless of basis (a vector can have an abstract representation without a basis), but in many graphics contexts, we specific a vector by its coefficients in a basis. Typically, the bases we choose in graphics are orthonormal, meaning that all vectors forming the basis have unit length and are perpendicular to each other. In this common case, we can express the dot product of two vectors in terms of their coefficients using the formula:

\[\vec{u}\cdot\vec{v}=(u_x,u_y,u_z)\cdot(v_x,v_y,v_z)=u_xv_x+u_yv_y+u_zv_z\]

In the same orthonormal basis, the length of a vector (squared) can be computed using a dot product as well.

\[\vec{u}\cdot\vec{u}=u_x^2+u_y^2+u_z^2 = \|\vec{u}\|^2\]

Applications

We will often use the dot product to

compute the angle between two vectors:

\[\cos(\theta) = \frac{\vec{u}\cdot\vec{v}}{\|\vec{u}\|\|\vec{v}\|}\]
compute the length of a vector:

\[\|\vec{u}\| = \sqrt{\vec{u}\cdot\vec{u}}\]
normalize a vector:

\[\hat{u} = \frac{\vec{u}}{\|\vec{u}\|}, \hat{u} \cdot \hat{u} = 1\]
compute the projection of a vector \(\vec{v}\) onto \(\vec{u}\)""

\[\vec{v}_\parallel = \frac{\vec{u}\cdot\vec{v}}{\|\vec{u}\|} \hat{u} = \frac{\vec{u}\cdot\vec{v}}{\|\vec{u}\|^2} \vec{u}\]
compute the portion of a vector \(\vec{v}\) perpendicular to \(\vec{u}\)

\[\vec{v}_\perp = \vec{v} - \vec{v}_\parallel = \vec{v} - \frac{\vec{u}\cdot\vec{v}}{\|\vec{u}\|^2} \vec{u}\]

Cross Product

The cross product or vector product is only really defined well for 3D vectors. Given two vectors \(\vec{u}\) and \(\vec{v}\) in 3D, \(\vec{u}\times\vec{v}\) computes a third vector:

\[\vec{u}\times\vec{v} = \|\vec{u}\|\|\vec{v}\|\sin(\theta)\hat{n}\]

where \(\hat{n}\) is a normal unit vector perpendicular to the plane formed by \(\vec{u}\) and \(\vec{v}\). In 3D there are two choices for which way this normal vector could point. The direction of \(\hat{n}\) for the cross product is determined by the right hand rule, even if you are left handed.

Right hand rule

To determine the direction of \(\hat{n}\) in \(\vec{u}\times\vec{v}\), point the fingers of your right hand in the direction of \(\vec{u}\) and curl them towards your palm in the direction of \(\vec{v}\). Your right thumb will point in the direction of \(\hat{n}\). If you find yourself trying to bend your fingers backwards, rotate your wrist, which will rotate your thumb.

Note that with the right hand rule, you can easily verify that \(\vec{u}\times\vec{v} = -\vec{v}\times\vec{u}\). If you want to get meaningful results out of a cross product, you will need to pay attention to the order of the multiplication.

Orthonormal basis computation

Much like the dot product case, the cross product is well defined regardless of basis, but given an orthonormal basis, we can define the components of the cross product as:

\[\vec{u}\times\vec{v}=(u_yv_z-u_zv_y,u_zv_x-u_xv_z,u_xv_y-u_yv_x)\]

You will not need to know this for the course. twgl or some other library will compute this for you.

Friday

The quiz is posted to gradescope. You may take it at any time prior to 10am Saturday. Once you start you must complete the exam within 60 minutes, though it is designed to take no more than 25 minutes. Unfortunately, given the distributed nature of the exam times, I cannot provide clarifications while you are taking the exam.

The View matrix

Today’s goal is to understand and derive the View matrix transform, a matrix that transforms world coordinates into eye coordinates. On Wednesday, I referred to this matrix at the lookAt matrix because this is what many OpenGL toolkits call this matrix. TWGL has a lookAt function that takes the same parameters as input but computes the inverse matrix, converting from eye coordinates to world coordinates. To avoid confusion since we are using twgl, I will refer to the matrix we want to compute as the View matrix, which is also a commonly used term.

The View matrix in practice

We saw an abstract sketch of the view matrix on Wednesday. We want to describe a viewer’s location (an eye or camera) with three parameters:

eye: the location of the viewer in world coordinates
at: the location of the viewer’s gaze in world coordinates (twgl calls this target, but its the same)
up: a vector in world coordinate that is roughly in the vertical direction in the eye frame. This does not have to be terribly precise, but it should not be parallel to the direction of the gaze.

In the Week 04 Demo, I have modified last week’s demo to support multiple viewpoints. By varying the viewpoint slider, the program computes a different view matrix. In the vertex shader, this matrix is a uniform that is multiplied between the projection matrix and the model matrix.

See getView(time) for some examples of how to set the views. I the first case, we set the eye at [0,0,20] and look at the center of our scene with the up vector being the y direction in the world. This is similar to our original setup from last week.

When we use a view matrix with a projection matrix however, we need to make some adjustment to the ortho matrix. The last two parameters of ortho are the distance to the near and far clipping planes respectively. Without the view matrix in place, these values were simply relative to the origin of the world, and we typically just picked -dim and dim. Once we have a view matrix in place, the values of near and far are relative to the transformed eye frame, where the eye location we previously specified in world coordinates now has coordinates [0,0,0] in the eye frame. Since we typically only see things in front of us, we set the near and far values both to positive distances describing the range of depth we want to view. In this application, since we stepped back 20 units, I set the near and far values to 5 and 50. Then we can see from z=15 to z=-30 in our scene.

If we change the eye to [0,0,-20] and keep everything else the same, we see the same scene, but from the opposite side. Note that this is purely a function of the view transform. We are not changing the ortho or model transform.

A more elaborate change is the top down view. We move the eye to [0,20,0], keeping the at the same. Since we are now looking down the vertical axis in the world, we need to change our up vector to define what is locally up in the eye frame. Anything could work here. I chose the +z axis, [0,0,1].

Deriving the View matrix

The view matrix is a flexible tool for moving about a scene or observing from different viewpoints. In any first person game where you are navigating a world, you typically do the animation by modifying the view matrix and repositioning the camera/viewer in the scene.

We will now look at how to derive the view matrix. This will be a helpful exercise since twgl does not provide a direct way for us to do this. We have to call m4.lookAt followed by m4.inverse. Going through the derivation will show how the twgl lookAt function is related to our work.

Fundamentally, this is a change of frame problem. We want to convert from world coordinates to eye coordinates. The input parameters will allow us express the basis and origin of the eye frame as a linear combination of the basis vectors and origin of the world frame without too much hassle. This will guide us towards the correct view matrix.

We will define the eye frame in terms of the following basis vectors:

\(\hat{n}\): A unit vector pointing from at towards eye. This is conceptually the local \(+z\) axis in the eye frame.
\(\hat{r}\): A unit vector pointing to the right of the view from the eye to the target. This is conceptually the local \(+x\) axis in the eye frame. We choose \(\hat{r}\) to be perpendicular to \(\hat{n}\)
\(\hat{u}\): A unit vector perpendicular to both \(\hat{r}\) and \(\hat{n}\) and roughly in the same direction as \(\vec{up}\).

The origin \(P_0\) of the eye frame will be, not surprisingly, the point defined by the eye input parameter.

eyeframe

Following the outline of the change of basis exercise of Week 03, we will start defining the eye frame basis vectors in terms of the world frame basis vectors. Even though we are provided \(\vec{up}\) as a vector, we will start with \(\hat{n}\).

\[\vec{n} = \texttt{eye}-\texttt{at}, \ \ \hat{n} = \frac{\vec{n}}{\|\vec{n}\|}\]

Since the coordinates of eye and at are in world coordinates, the coefficients computed in this manner are the coefficients of \(\hat{n}\) in world coordinates. What are the coefficients of \(\hat{n}\) in eye coordinates? Recall you can use the dot product to compute the length of a vector. twgl.v3 has length, lengthSq and dot methods to help you.

Next up is the right vector, for which it seems we have no information. But since we have computed \(\hat{n}\) and are given \(\vec{up}\), we can compute a vector perpendicular to both with the cross product.

\[\vec{r} = \vec{up} \times \hat{n}, \ \ \hat{r} = \frac{\vec{r}}{\|\vec{r}\|}\]

At this point, the vectors \(\hat{n}, \hat{r}\), and \(\vec{up}\), but not necessarily an orthonormal basis as it is possible for \(\vec{up}\) to not be perpendicular to \(\hat{n}\). We can create a new vector \(\hat{u}\) which forms an orthonormal basis as follows:

\[\hat{u} = \hat{n} \times \hat{r}\]

Note that unlike the \(\hat{n}\) and \(\hat{r}\), I did not perform an explicit normalization step here. Why not?

To fully transition between the frames, I must also express the eye position \(P_0\) in terms of the world basis and origin \(Q_0\), but this is just the eye coordinates themselves.

\[ P_0 = (e_x, e_y, e_z, 1)^T \begin{pmatrix} \vec{w_x} \\ \vec{w_y} \\ \vec{w_z} \\ Q_0 \end{pmatrix}\]

Just like our week 3 change of frame, we now have the eye frame expressed as coefficients in the world frame.

\[\mathbf{v} = \begin{pmatrix} \hat{r} \\ \hat{u} \\ \hat{n} \\ P_0 \end{pmatrix} = \begin{bmatrix} r_x & r_y & r_z & 0 \\ u_x & u_y & u_z & 0 \\ n_x & n_y & n_z & 0 \\ e_x & e_y & e_z & 1 \\ \end{bmatrix} \cdot \begin{pmatrix} \vec{w_x} \\ \vec{w_y} \\ \vec{w_z} \\ Q_0 \end{pmatrix} = \mathbf{M} \cdot \mathbf{w}\]

If a point/vector is expressed with coordinates \(\mathbf{a}^T=(a_1, a_2, a_3, a_4)\) in the world frame \(\mathbf{w}\) and coordinates \(\mathbf{b}^T=(b_1, b_2, b_3, b_4)\) in eye frame \(\mathbf{v}\), we can convert back and forth using the matrix \(\mathbf{M}^T\) and its inverse as follows:

\(\mathbf{a}^T\mathbf{w} =\mathbf{b}^T\mathbf{v} = \mathbf{b}^T\mathbf{M}\mathbf{u} \implies \mathbf{a}^T= \mathbf{b}^T\mathbf{M}\)

To convert to world coordinates from view coordinates , use

\(\mathbf{a} = \mathbf{M}^T\mathbf{b}\).

To convert from world coordinate to view coordinates, use

\(\mathbf{b} = \mathbf{M^{-T}}\mathbf{a}\)

In this case, we actually want the second version, \(\mathbf{b} = \mathbf{M^{-T}}\mathbf{a}\), meaning we need the inverse of the transpose of \(\mathbf{M}\).

But let’s look at \(\mathbf{M}^T\) a little more closely.

\[\mathbf{M}^T = \begin{bmatrix} r_x & u_x & n_x & e_x \\ r_y & u_y & n_y & e_y \\ r_z & u_z & n_z & e_z \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & e_x \\ 0 & 1 & 0 & e_y \\ 0 & 0 & 1 & e_z \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} r_x & u_x & n_x & 0 \\ r_y & u_y & n_y & 0 \\ r_z & u_z & n_z & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} = \mathbf{TR}\]

The matrix \(\mathbf{M}^T\) can be expressed as the product of two matrices \(\mathbf{T}\) and \(\mathbf{R}\), where \(\mathbf{T}\) is a translation matrix, and \(\mathbf{R}\) is a generic rotation matrix. A generic rotation matrix has the property that all the columns/rows are orthonormal to each other. These special matrices have are known as orthogonal or unitary matrices and have the special property that their inverse is their transpose. The inverse of a translation matrix is a translation in the opposite direction. Using some general linear algebra properties, we can compute the inverse of \(\mathbf{M}^T\) without too much work.

\[\mathbf{M}^{-T} = \mathbf{(TR)}^{-1} = \mathbf{R}^{-1} \mathbf{T}^{-1} =\mathbf{R}^{T} \mathbf{T}^{-1} = \begin{bmatrix} r_x & r_y & r_z & 0 \\ u_x & u_y & u_z & 0 \\ n_x & n_y & n_z & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & -e_x \\ 0 & 1 & 0 & -e_y \\ 0 & 0 & 1 & -e_z \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} r_x & r_y & r_z & -\hat{r}\cdot\vec{e} \\ u_x & u_y & u_z & -\hat{u}\cdot\vec{e}\\ n_x & n_y & n_z & -\hat{n}\cdot\vec{e}\\ 0 & 0 & 0 & 1 \\ \end{bmatrix}\]

where \(\vec{e}\) is a vector from the world origin to the eye origin.

This powerful matrix \(\mathbf{M}^T\) which can be computed from three relatively easy to interpret parameters allows us to position a viewer or camera in the scene and transform the displayed image to the point of view of the viewer. As part of lab 5, you will be asked to extend your solar system to include camera controls and this view matrix in your shader pipeline.

Given this matrix and the ability to position a viewer anywhere in the scene, how could you implement the following motions?

Move the camera closer to the scene?
Turn to look at something to the right of the screen
Tilt and look up?
Slide/Pan left or right?

You can describe how you would adjust your call to lookAt (in world coordinates), or how you would adjust the view matrix directly (in eye coordinates).