Projection: Direct3D Edition

Projection is one of the core transformations done in 3D graphics. It's generally represented by a $4\times4$ matrix, and it is the thing that links view space to clip space. The basics are pretty standard, and you can easily find them elsewhere. This article is about interesting things you can do with the projection matrix once you've got it.

Now, projection matrices differ depending on the conventions underlying the target environment. OpenGL's clip space extends from \(-1\) to $1$ in all three axes with the z-axis pointing out of the screen. DirectX's z-axis points into the screen, and the near plane is mapped to \(z=0\) rather than \(z=-1\). This article will focus on matrices using the DirectX convention, though similar derivations exist for OpenGL.

Note that I'm using only the DirectX clip-space convention. I'm going to stick with the standard points-are-column-vectors-on-the-right convention used in most graphics literature, so my projection matrix will appear transposed with respect to that seen in the DirectX documentation on MSDN. I'm also going to assume that view space is right-handed, so one minus sign is moved.

Moving right along, here's the matrix:

\[ P=\begin{bmatrix} P_{11} & 0 & 0 & 0 \\ 0 & P_{22} & 0 & 0 \\ 0 & 0 & P_{33} & P_{34} \\ 0 & 0 & -1 & 0 \end{bmatrix} \]
\[ P_{33}=\frac{z_f}{z_n-z_f},\;P_{34}=\frac{z_nz_f}{z_n-z_f} \]

\(z_n\) and \(z_f\) are, respectively, the distances to the near and far clip planes in view space units. \(P_{11}\) and \(P_{22}\) can be computed using one of two sets of formulas:

\[ P_{11}=\frac{2z_n}{v_w},\;P_{22}=\frac{2z_n}{v_h} \]
\[ P_{11}=cot\left({\tiny\frac{1}{2}}f_h\right),\;P_{22}=cot\left({\tiny\frac{1}{2}}f_v\right) \]

The first pair sizes the frustum in terms of the size of the viewport (\(v_w\), \(v_h\)) in view-space units on the near plane. The second pair (which I find more natural to use) sizes the frustum based on the horizontal and vertical field of view angles (\(f_h\) and \(f_v\)).

Further, if we project a point in view space \(\mathbf{v}\) through \(P\), we get the following coordinates:

\[ P\mathbf{v}=\mathbf{v'}=\begin{bmatrix}x' \\ y' \\ z' \\ w'\end{bmatrix}= \begin{bmatrix} P_{11}x \\ P_{22}y \\ z\frac{z_f}{z_n-z_f}+\frac{z_nz_f}{z_n-z_f} \\ -z \end{bmatrix} \]

Extracting the Frustum Parameters

Alright, so we've got a matrix, but some odd algorithm somewhere needs to know what the near-clip plane is, or what the field of view is, or how big the viewport is.

Field of View

The horizontal and vertical field of view are fairly straightforward. We can simply reverse the second set of formulas for \(P_{11}\) and \(P_{22}\).

\[ P_{11}=cot\left({\tiny\frac{1}{2}}f_h\right)=\left[tan\left({\tiny\frac{1}{2}}f_h\right)\right]^{-1} \]
\[ \arctan(P_{11}^{-1})={\tiny\frac{1}{2}}f_h \]
\[ f_h=2\arctan(P_{11}^{-1}) \]

Similarly, the vertical field of view is:

\[ f_v=2\arctan(P_{22}^{-1}) \]

Aspect Ratio

Another easy one is the viewport's aspect ratio. The aspect is defined as the viewport's width over its height. We've got formulas for those values:

\[ P_{11}=\frac{2z_n}{v_w},\;P_{22}=\frac{2z_n}{v_h} \]
\[ v_w=\frac{2z_n}{P_{11}},\;v_h=\frac{2z_n}{P_{22}} \]

And while \(z_n\) is unknown, that doesn't matter as we only want the ratio between the two, and it cancels out:

\[ aspect=\frac{v_w}{v_h}=\frac{\frac{2z_n}{P_{11}}}{\frac{2z_n}{P_{22}}}=\frac{P_{22}}{P_{11}} \]

The Near-Clip Distance

Getting the value of \(z_n\) is also simple. Notice that the only difference between \(P_{34}\) and \(P_{33}\) is that the former has an extra \(z_n\) multiplied into it.

\[ z_n=\frac{P_{34}}{P_{33}} \]

The Far-Clip Distance

This one's a little trickier, but again, it's just some algebra on \(P_{34}\) and \(P_{33}\). Let's start with the formula for \(P_{33}\) and substitute in our formulat for \(z_n\):

\[ P_{33}=\frac{z_f}{z_n-z_f}=\frac{z_f}{\frac{P_{34}}{P_{33}}-z_f} \]
\[ \frac{1}{P_{33}}=\frac{\frac{P_{34}}{P_{33}}-z_f}{z_f} \]
\[ \frac{z_f}{P_{33}}=\frac{P_{34}}{P_{33}}-z_f \]
\[ z_f=P_{34}-P_{33}z_f \]
\[ -P_{34}=-z_f-P_{33}z_f \]
\[ P_{34}=z_f+P_{33}z_f \]
\[ P_{34}=z_f(P_{33}+1) \]
\[ z_f=\frac{P_{34}}{P_{33}+1} \]

The Viewport Size

And now that we have the near-clip distance, we can easily compute our viewport's dimensions:

\[ v_w=\frac{2z_n}{P_{11}},\;v_h=\frac{2z_n}{P_{22}} \]
\[ v_w=\frac{2\frac{P_{34}}{P_{33}}}{P_{11}},\;v_h=\frac{2\frac{P_{34}}{P_{33}}}{P_{22}} \]

Inverting the Matrix

The projection matrix is pretty sparse, so it's quite straightforward to write an optimized matrix inverse function just for it. Of course, a general $4\times4$ matrix inverse function will work, but the inverse of the projection matrix pops up enough that it doesn't hurt to have an optimized routine in your toolkit.

One note before we proceed – matrices intended for left and right-handed view-space differ in the sign of the values in the last two rows – including that of the \(-1\) in our \(P\). However, the matrix-inverse code doesn't need to care about this, so I will substitute a variable \(P_{43}\) for this section:

\[ P=\begin{bmatrix} P_{11} & 0 & 0 & 0 \\ 0 & P_{22} & 0 & 0 \\ 0 & 0 & P_{33} & P_{34} \\ 0 & 0 & P_{43} & 0 \end{bmatrix} \]

Let's start with the determinant:

\[ det(P)=\begin{vmatrix} P_{11} & 0 & 0 & 0 \\ 0 & P_{22} & 0 & 0 \\ 0 & 0 & P_{33} & P_{34} \\ 0 & 0 & P_{43} & 0 \end{vmatrix}= P_{43}\begin{vmatrix} P_{11} & 0 & 0 \\ 0 & P_{22} & 0 \\ 0 & 0 & P_{34} \end{vmatrix}= P_{11}P_{22}P_{34}P_{43} \]

Looking at the construction of the matrix, that's clearly not going to equal zero. So the matrix is definitely invertible. I'll spare you the tedious algebra and skip straight to the answer. Apply your favorite method of finding the inverse and waste a bunch of paper simplifying if you don't believe me (or multiply it by \(P\) and see if you get \(I\)).

\[ P^{-1}=\begin{bmatrix} P_{11}^{-1} & 0 & 0 & 0 \\ 0 & P_{22}^{-1} & 0 & 0 \\ 0 & 0 & 0 & P_{43}^{-1} \\ 0 & 0 & P_{34}^{-1} & -\frac{P_{33}}{P_{34}P_{43}} \end{bmatrix} \]


This is a fairly useful operation. It's often used to implement picking, and it has a significant application in deferred-shading. But let's not get ahead of ourselves.

Given a point in view space \(\mathbf{v}\), its counterpart in clip space (\(\mathbf{v'}\)) is

\[ \mathbf{v'}=P\mathbf{v} \]

Which is trivially reversible given that \(P\) is invertible:

\[ \mathbf{v}=P^{-1}\mathbf{v'} \]

\(\mathbf{v}\) is, of course, translated in and out of homogeneous space by adding and dividing away a \(w\) component in the usual way, and the same applies to \(\mathbf{v'}\).