One Lost Henchman

Some tools, some graphics, some aimless rambling

Home Posts

This content has been through a few migrations. Typos and other small errors may have crept in.

By request, an accessible explanation of the GJK (that's short for Gilbert-Johnson-Keerthi) intersection algorithm. I'll note in advance for the uninitiated - this algorithm only works for convex objects. Arbitrary convex objects, yes, but they must be convex. That's important!

Another note: the full GJK algorithm is more than just an intersection test. It returns data about the actual separation between the objects (if they are non-intersecting). I'm presenting a simple boolean intersection query, since doing so simplifies things greatly without rendering this whole page useless. The fundamentals are the same for the full algorithm, and an understanding of this version will go a long way towards an understanding of the real deal.

This is a high-level description of the algorithm. The implementation notes are here, for those that already understand how the algorithm works and just want to code it.

Intersection queries: a primer

The hardest part of explaining GJK is explaining exactly what it is that it's doing and why that actually works. Next to that, the implementation is fairly straight-forward.

So let's start by looking at one of the simplest intersection queries there is to try and work up a bit of intuition on exactly what it is that we want to accomplish. Don't pay attention to the particular geometry of this problem, which doesn't generalize well to arbitrary shapes at all. Pay attention to the way I'm going to transform the problem into a slightly different one, which I'll then generalize to arbitrary convex forms.

And so, behold! A circle-circle intersection:

The diagram is interactive: you can drag each circle by its center or change its radius by dragging a point on its edge. The circles will fill in if they intersect.

Anyway, what we have here are two circles in 2D. Each circle is defined by its center point (we'll call that \(C\)) and its radius \(R\). For our two circles, we have two pairs of these variables, which we'll call \(C_r\), \(C_b\), \(R_r\), and \(R_b\) (the subscripts stand for "red" and "blue", in case it wasn't painfully obvious). The circle-circle intersection test is absurdly simple: a pair of circles intersects if and only if the distance between their centers is less than the sum of their radii. That is, the circles intersect if the following is true:

\[ distance( C_r, C_b ) < R_r + R_b \]

We're going to rearrange that formula a little. Let's start with the left side.

\[ distance( C_r, C_b ) = length( C_r - C_b ) \]

Remember: a point minus a point is a vector. A point plus a vector is a point. The \(0\) in this next bit represents the point at the origin.

\[ length( C_r - C_b ) = length( 0 + (C_r - C_b) - 0 ) \]

We haven't really changed anything. We added the vector \(C_r - C_b\) to the origin to get some point and then subtracted the origin from that to get our original vector back. Why'd we do that? Because the above is one set of parentheses away from looking like another distance operation:

\[ length( 0 + (C_r - C_b) - 0 ) = length( (0 + (C_r - C_b)) - 0 ) = distance( 0 + (C_r - C_b), 0 ) \]

Plugging that back into our original formula gives us:

\[ distance( 0 + (C_r - C_b), 0 ) < R_r + R_g \]

And then we clean things up a little by substituting the following: \(C_g = 0 + (C_r - C_b)\), \(R_g = R_r + R_b\).

\[ distance( C_g, 0 ) < R_g \]

And that's it. What we've done above is change the problem into an equivalent one. Instead of asking whether two circles intersect, we now ask whether a third circle, constructed out of the two, contains the origin.

That's a fairly silly derivation for a pair of circles, but it's a very simple illustration of the general transform we're going to apply to intersection problems in order to handle arbitrary convex forms.

The Minkowski sum

So how exactly did we construct the green circle above? In essence, what we've done is subtract area off of one circle (the blue one) and add it to the other. That's all well and good with a pair of circles, but real objects don't always have handy things like centers and radii for us to work with. What we need is a more general approach.

That approach is based on something called the Minkowski sum (represented by the symbol \(\oplus\)). You'll get a lot more detail out of the Wikipedia article linked, but in short, the Minkowski sum operates by treating each point within the second shape (yes, all infinity of them) as a vector and adding each vector to each point in the first shape, producing a whole new set of (infinitely many) points which define a shape. (I'm going to become less and less careful about the distinction between points and vectors as this goes on. If you see a point where a vector should be, subtract the origin off of it. If you see the reverse, add it to the origin.)

However, we're not interested in the plain Minkowski sum here. We're interested in a slightly different operation, commonly referred to as the Minkowski difference, which works just like the sum, except that each of the vectors derived from the second shape is subtracted from the first shape's points. In geometric terms, the Minkowski difference of two shapes is the Minkowski sum of the first shape and the second shape's reflection through the origin (drawn below in the darker blue).

Our silly algebra up above with the pair of circles is such that the formulas for the green circle's center and radius are equivalent to taking the Minkowski difference of the red and blue circles and simplifying the formulas as far as they'll go. The nice thing about the Minkowski difference is that, because it isn't simplified, it will work with any pair of shapes, and the composite shape it produces contains the origin if and only if the two source shapes intersect.

Unfortunately, using the Minkowski sum also leaves us with one itty bitty infinitely large problem: the Minkowski sum works with arbitrary shapes because it reduces them to one of the most basic representations imaginable: an infinitely large set of points. And we can't exactly compute with infinity points. So we need to represent our shapes in a different manner.

Describing the green shape

Let's start with what we've got. We've got our two shapes, a red one and a blue one, which we'll represent as the point sets \(R\) and \(B\). Now we want to perform a Minkowski difference, so instead of \(B\) we'll need to operate on its mirror through the origin. Let's call that \(-B\), since negation is descriptive of how the set is built to begin with. And then we've got our green shape represented as the set \(G = R \oplus -B\).

Now, remember how I made a big deal about these shapes all being convex? This is where that starts to be important. We know (well, we demand) that \(R\) and \(B\) represent convex shapes. Reflection through the origin doesn't change whether a figure is convex or concave, so we also know that \(-B\) is convex. And the Minkowski sum of convex shapes is also convex (this is somewhat intuitive, though I'm sure there's a proper proof out there somewhere), which covers \(G\).

So what does that mean? Well, one of the defining properties of convex forms is that no straight line can possibly intersect the shape more than twice. And if the hypothetical line is directed, then it will (obviously) have a maximum of one entry and one exit point. As these intersection points are all on the surface of the shape, we can actually describe our shape by throwing enough (again, infinitely many in the general case) directed lines at it and taking just the exit points.

In fact, we don't even need the full lines. We can define the shape just as well using pure vectors (directions) by taking the exit points of all possible lines parallel to the vector and keeping only the one farthest along said vector. Wrap that last idea up into a function, and you have what's called a support mapping.

Support mappings

The formal definition of the support mapping of a convex shape \(A\) (where \(A\) denotes the set of all points within or on the surface of the shape) looks something like the following:

\[ S_A(\vec{v})\in A,\;\vec{v}\cdot S_A(\vec{v})=\max\left\{\vec{v}\cdot x : x \in A\right\} \]

So a support mapping is a function (mathematical or, as you'll see shortly, otherwise) which is associated with a given convex shape, and which takes a vector (\(\vec{v}\)) as input and returns a point on the shape's surface which is maximally extreme with respect to \(\vec{v}\). To make things somewhat simpler, if there are many such points (say \(\vec{v}\) is perpendicular to a flat face), the support mapping is allowed to return any one of those points.

Here are a few examples of support mappings:

vec3 sphere_support(const sphere &s, const vec3 &v)
{
    //remove the division if v is known to be normalized elsewhere
    return s.center + v * (s.radius / length(v));
}

//aabb = axis-aligned bounding box
vec3 aabb_support(const aabb &bb, const vec3 &v)
{
    vec3 ret;

    ret.x = v.x >= 0 ? bb.max.x : bb.min.x;
    ret.y = v.y >= 0 ? bb.max.y : bb.min.y;
    ret.z = v.z >= 0 ? bb.max.z : bb.min.z;

    return ret;
}

Here's another one. It operates on the convex hull encasing an arbitrary set of points (for instance, the corners on a frustum):

vec3 point_cloud_support(const vec3 points[], unsigned int n_points, const vec3 &v)
{
    unsigned int best = 0;
    float best_dot = dot(points[0], v);

    for (unsigned int i = 1; i < n_points; i++)
    {
        float d = dot(points[i], v);
        if (d > best_dot)
        {
            best = i;
            best_dot = d;
        }
    }

    return points[best];
}

Simple, and no infinities in sight. One thing remains.

A support mapping for the Minkowski difference

We have our red and blue shapes, so we know what \(R\) and \(B\) represent, and that allows us to write simple little support mappings like the above (call those \(S_R(\vec{v})\) and \(S_B(\vec{v})\)). But what about \(G\)'s support mapping?

Well, let's go back to the definition of the Minkowski sum. The farthest point on the surface of the green shape along \(\vec{v}\) is going to be the sum of the farthest points along \(\vec{v}\) in its source shapes, \(R\) and \(-B\):

\[ S_G(\vec{v})=S_R(\vec{v})+S_{-B}(\vec{v}) \]

The only wrinkle is that the above requires \(S_{-B}(\vec{v})\), but we only have \(S_B(\vec{v})\). Not to worry, \(-B\) is just a reflection of \(B\), so we can just mirror the results of its support mapping. There's one additional step, though: mirroring the whole mapping mirrors not just the result, but also the meaning of \(\vec{v}\), so we'll negate that as well, in order to compensate:

\[ S_{-B}(\vec{v})=-S_B(-\vec{v}) \]

Which we can plug into our formula for \(G\)'s support mapping:

\[ S_G(\vec{v})=S_R(\vec{v})-S_B(-\vec{v}) \]

And that's that.

With that formula and our hand-crafted support mappings for each type of shape, we're free to test for intersections between any pair of convex shapes. All we need now is the algorithm which is going to make use of \(S_G(\vec{v})\).

Searching for the origin

And now we're at the core of the GJK intersection algorithm. This is the part where we search for the origin in \(G\). Now, our search space is potentially huge, so we want an algorithm that's going to head straight for the origin, cutting the available search space down as quickly as possible. We also want something relatively simple, so that we don't go insane trying to write and debug the thing. So we're going to accomplish that by jumping around the figure in an orderly manner, trying to construct a simplex (in 3D, a tetrahedron) that contains the origin.

To do that, we're going going to pick points one at a time from the support mapping and add them to a set of points. We're then going to look at the figure described by that set (two points is a line, three is a triangle, four is a tetrahedron, so on if you're working in four or more dimensions) and figure out which of its aspects (vertices, edges, faces) the origin is closest to (that is, which aspect's Voronoi region contains the origin). We're then going to throw away all of the points that aren't part of that aspect and pick a search direction to find our next point, such that we converge on the origin. That direction has two defining properties - it's perpendicular to the part of the set we're keeping, and it points towards the origin.

We stop when we manage to build a simplex that contains the origin or when we find that it's impossible to do so.

So let's look at some examples. The cases differ based on the number of points in our current simplex, so I'll go over a few of them. The cases are labelled by the number of points in the current simplex after a new point has been selected. Each case is responsible for deciding which of the points to keep and then computing the search direction used to get the next point.

No points

This is the case when we start the algorithm. As we have no points, we really haven't got anything to restrict our selection of the first point. We pick an arbitrary starting search vector \(\vec{v}\), get its associated point from the support mapping, and go to the next iteration.

One point

This is where we land after the first step. We've got one point to work with, so all we can do is look for another. We search along a vector pointing from our point towards the origin.

Two points

This produces two points (\(A\) and \(B\)) which define a line segment. This divides our space into three regions (this is an edge-on view of 3D space - the middle region wraps all the way around the line segment): the space closest to \(A\), the space closest to \(B\), and the space closest to \(\overline{AB}\). Our dividing planes are the two planes perpendicular to the line segment and passing through the endpoints. Great, what do we do with it?

Well, the question is which aspect of the line is the origin closest to? There are the end points and then there's the line segment itself. The Voronoi diagram above is the answer. If the origin lies in the region nearest to \(A\), then we throw away point \(B\). If it lies in the region on the far side nearest \(B\) then we keep \(B\) and throw \(A\) out. Otherwise it must lie closest to the line segment itself (in that center region), and we keep both \(A\) and \(B\).

Figuring out which region we're in is remarkably simple, too, it's just a few dot products. (Actually, that's a lie. In this case the origin will always be in the region nearest the line segment, but that's an optimization I'll discuss in the implementation notes.)

Once we reduce the simplex to those parts nearest the origin, we need to come up with a search direction for the next point. That's easy enough if we've reduced the simplex down to a single point - we just pick the vector from that point towards the origin (as we did in the previous one-point case). If we have a line segment, then we need to do a few cross products to come up with a vector that's oriented towards the origin and is perpendicular to the line.

Three points

So again, we have some regions to consider:

the space closest to \(A\),
the space closest to \(B\),
the space closest to \(C\),
the space closest to \(\overline{AB}\),
the space closest to \(\overline{BC}\),
the space closest to \(\overline{CA}\),
the space within \(\triangle{ABC}\) (both above and below it).

If the origin is in the triangle, then we keep all three points and take a direction perpendicular to the plane the triangle lies in and in the direction of the origin. Otherwise, depending on which region it falls into we'll keep just that part of the figure and iterate again. (For reasons that I'll cover in the implementation notes, we don't actually need to consider most of these regions.)

The only subtlety here is that the central region is actually two regions. This is 3D space, remember? There's the volume "above" the triangle and the volume "below". That's a distinction which will help us make an awesome optimization when it's time to write the code.

Four points

A model tetrahedron made of Q-tips and scotch tape.

I'm not even going to try to make a diagram of a tetrahedron. I spent hours drawing them on paper when I was working my way through this the first time, and I've come to the conclusion that it's pointless to even try to get all the nuances of it down in 2D. In the end I built a little model, which I highly recommend to anyone that's trying to wrap their brain around this.

The important case with the tetrahedron is the volume inside of it. If the origin is found to be there, then we're done. We've built a shape that contains the origin, thus there's nothing more to do.

One note though - don't be daunted by the apparent number of cases that a tetrahedron creates! A great many of them are impossible, and the rest are largely duplicates of each other. Again, more on this in the implementation notes.

Knowing when to stop

But what if the objects don't intersect? How do we figure that out? That's rather easy. See all those spots where we get our next point from the support mapping? If that next point is ever closer to our existing figure than the origin itself is, then we know we can stop. The support mapping gives us the farthest point in a given direction, and our search direction is always pointed at the origin. So if we go as far as we can towards the origin and fail to reach it, then we know we'll never get there, and we can abort the search knowing that the two objects do not, in fact, intersect.

Finally, the details

Those little parenthetical notes I left above stating that certain cases weren't going to be a problem for us - the explanation is on the implementation page.