The GJK Algorithm

By request, an accessible explanation of the GJK (that's short for Gilbert-Johnson-Keerthi) intersection algorithm. I'll note in advance for the uninitiated - this algorithm only works for convex objects. Arbitrary convex objects, yes, but they must be convex. That's important!

Another note: the full GJK algorithm is more than just an intersection test. It returns data about the actual separation between the objects (if they are non-intersecting). I'm presenting an intersection query since doing so simplifies things without rendering the whole thing useless. The fundamentals are the same for the full algorithm, and an understanding of this version will go a long way towards an understanding of the real deal.

This is a high-level description of the algorithm. The implementation notes are here, for those that already understand how the algorithm works and just want to code it.

Intersection Queries: A Primer

The hardest part of explaining GJK is explaining exactly what it is that it's doing and why that actually works. Next to that, the implementation is fairly straight-forward.

So let's start by looking at one of the simplest intersection queries there is to try and work up a bit of intuition on exactly what it is that we want to accomplish. Don't pay attention to the particular geometry of this problem, which doesn't generalize well to arbitrary shapes at all. Pay attention to the way I'm going to transform the problem into a slightly different one, which I'll then generalize to arbitrary convex forms.

And so, behold! A circle-circle intersection:


So here we've got two circles in 2D space, each of which is defined by a center (C) and a radius (R). Let's call these values Cr, Cb, Rr, and Rb (the subscripts stand for red and blue, in case it wasn't painfully obvious). The circle-circle intersection test is painfully simple: a pair of circles intersects if and only if the distance between their centers is less than the sum of their radii. That is, the circles intersect if the following inequality is true:

distance( Cr, Cb ) < Rr + Rb

Let's play with the first half of the comparison for a bit:

distance( Cr, Cb )
= length( Cr - Cb )

And that's just math, which means we can play algebra, a bit. For one thing, we can add and subtract zeroes (in this case, the point at the origin) all we like:

= length( 0 + (Cr - Cb) - 0 )

Which is one set of parentheses away from looking like our definition of the distance function:

= length( (0 + (Cr - Cb)) - 0 )
= distance( 0 + (Cr - Cb), 0 )

So the first half of our test to see whether the two circles overlap can be rewritten from the distance between Cr and Cb to the distance between some other point, let's call it Cg, which equals 0 + (Cr - Cb), and the origin. For the sake of completeness, let's transform the right side too, with a simple substitution. Let Rg = Rr + Rb, and the inequality becomes:

distance( Cg, 0 ) < Rg

You've probably already spotted it. The core of our test isn't about circles any more, it's about a circle, which I'm going to refer to as the green one, and the origin. We're no longer asking whether the red and blue circles overlap, we're asking whether the green circle contains the origin.




That might not look terribly intuitive, but the math says it works, and indeed it does.

Now that's a lot of silly work to go through for a pair of circles, and since both expressions simplify to exactly the same thing it seems rather pointless, but the technique becomes extremely useful given a more complex pair of objects.

The Minkowski Sum

But before we can generalize the construction of the green circle to other objects, we need to be clear on exactly how it is that we created it, because there's a bit of an odd point here. See, subtracting makes sense if the idea is to just shift things around a bit, so there's nothing surprising about Cg = 0 + (Cr - Cb). We do that to the centers of both circles and we've just slid things around, but it's all by the same amount, so while the one circle's now on the origin, the other's just as far as it always was, and our test remains valid. Adding the radii also makes sense - we're not making this new circle up out of nothing, we're moving space from the circle which we now placed on the origin to the other one and seeing if it will grow enough to engulf the origin. And that's a touch unintuitive, but it makes enough sense that it's easy to spot why the math works as it does.

But arbitrary convex shapes don't necessarily have handy things like "centers" or "radii" which we can shuffle around with simple math. So how are we going to generalize the transformation we just did? How can we transfer volume from one arbitrary convex to another such that we reduce it to a point while the other becomes the equivalent of our green circle?

The answer is the Minkowski sum. A Minkowski sum is a shape formed by "adding" two other shapes together. This addition is carried out by treating all the points in the second shape as vectors from the origin and then adding each of those vectors to each of the points in the first shape. The effect is that of defining a volume by sweeping one shape over the surface of the other. The Wikipedia article has better diagrams than I'm likely to produce, so I'll get back to the GJK algorithm.

Now, the Minkowski sum alone isn't enough to form our green circle. We need to carry out one last operation, and that's the shift to the origin which we accomplished by subtracting Cb away from Cr. Google and people's random papers might tell you otherwise, but there's no such thing as a "Minkowski difference" (OK, I'm arguing semantics, and I do feel slightly silly doing so right before describing the operation commonly referred to by that name), so we're going to have to come up with our subtraction somewhere else. That somewhere else is a reflection about the origin. The green circle is the Minkowski sum of the red circle and the reflection about the origin of the blue circle (drawn below in the dotted blue).


The really nice property of the Minkowski sum is that it works on any pair of shapes. So given any pair of convex objects (call them "red" and "blue"), we can form the equivalent "green" object by computing the Minkowski sum of the red object and the blue object's reflection through the origin. And, like our green circle, that green object will contain the origin if and only if the red and blue objects intersect. I'm not really here to prove that, but the full GJK papers will refer you to a rigorous demonstration of the fact, if you care to Google for them.

So given an arbitrary red convex and an arbitrary blue convex, how do we form our green composite object? I mean, yeah, we have our definition, but that's just logic. We need something to compute with. So let's start with the definition, and see if we can work it into such a form.

The first thing of note in the definition is that it talks about "the points" in the input shapes. That tells us is that we need to forget about the usual representations for a moment (centers and radii, dividing planes, convex hulls, etc) and our shapes as mathematical sets of points instead. So let's define the sets R and B as the sets containing all points in the red and blue circles, respectively. Now, when I say all the points, I really do mean all of them, and yeah, I know, there are infinitely many in each shape. But that's OK, we're still just thinking here, infinities are allowed. Now we don't add the sets directly, instead we need to reflect one about the origin, so let's stick with reflecting the blue circle and call that the set -B. And then our set G is simply the set formed by adding all the points in R to all the points in -B (I know, I know, you can't add points, but proper notation's getting tedious - pretend I'm adding and subtracting the origin as appropriate to turn points into vectors and vice-versa).

Actually, I can give a partial proof of why GJK works - specifically the reason we're so worried about the origin. Think about how we've created all the points in G - we've added points from R and -B together. And the points in -B are the negated points in B (that's what reflection through the origin is). What's the only possible way that we could get zeroes (that is, the origin) in G? From the properties of addition, the only way to get a zero result is to add a number to its negative. So G contains the origin if and only if -B contains a point which is the mirror of a point in R - which implies that the original B contained that very same point, which can only happen if R and B intersect.

So that's really the core principle, here: a pair of convex objects intersect if and only if the Minkowski sum of the one mirrored through the origin and the other contains the origin. The trick is in figuring out whether the green shape really does contain the origin or not.

But before we can do that, we really need to get rid of these infinities.

Describing the Green Object

We obviously aren't going to go about things by computing two infinite sets of points, negating one, and then creating an even more infinite set of points from the two, and then searching through it for the origin. The idea is ridiculous on its face, so let's think about this a little longer. What do we know about the green object?

Well, for one thing, it's going to be convex. Remember what we said about the Minkowski sum operating by sort of sweeping one figure over the other? A convex object swept over another convex object forms, well, a convex object. And the reflection of a convex object is still convex. And our red and blue objects are convex, so the green one is necessarily convex as well. This will be important in a moment.

So let's get down to it - let's find some way to computationally define this shape. We have our original objects, and we have our idea of the green object, and we have the definition of the Minkowski sum. How can we combine those into a concrete definition of the green figure? It all comes down to representation, and the last paragraph suggests a very interesting representation indeed.

All of our objects here are convex. That means that it's impossible to intersect the object's surface more than twice with any given line. If that line has an associated direction, then we can even label the two intersections - one enters, the other exits. Given enough directed lines, the exit points alone are sufficient to describe the object's shape. In fact, we don't even need full lines - simple directions are enough, so long as we restrict ourselves to getting the farthest possible exit point for any line parallel to said direction.

Such a representation is called a support mapping.

Enter the Support Mapping

A support mapping is a function that takes a vector v and returns the point on the surface that's most extreme in v's direction. And to make thing even easier, if there are multiple such points (say v is perpendicular to a planar facet), it can return any of those points. Here are a few examples:

vec3 sphere_support( const sphere &s, const vec3 &v )
{
    //remove the division if v is known to be normalized elsewhere
    return s.center + v * s.radius / length( v );
}

//aabb = axis-aligned bounding box
vec3 aabb_support( const aabb &bb, const vec3 &v )
{
    vec3 ret;

    ret.x = v.x >= 0 ? bb.max.x : bb.min.x;
    ret.y = v.y >= 0 ? bb.max.y : bb.min.y;
    ret.z = v.z >= 0 ? bb.max.z : bb.min.z;

    return ret;
}

Here's another one, for the convex hull encasing an arbitrary set of points (for instance, the corners of a frustum):

vec3 point_cloud_support( const vec3 points[], unsigned int n_points, const vec3 &v )
{
    unsigned int best = 0;
    float best_dot = dot( points[0], v );

    for( unsigned int i = 1; i < n_points; i++ )
    {
        float d = dot( points[i], v );
        if( d > best_dot )
        {
            best = i;
            best_dot = d;
        }
    }

    return points[best];
}

See? Easy-sauce. No infinite point sets, after all.

Once More, In Green

And given a pair of support mappings SR and SB, we can fairly easily construct one for our green object SG.

SG(v) = SR(v) + S-B(v)

That's basically straight from the definition of the Minkowski sum. The furthest point in a direction is fairly obviously going to be the sum of the furthest points in that direction of the addends. However, we have a slight problem here. We have SB, but we need S-B. Well, -B is just the reflection of B, and that's just a negation, so we should be able to similarly reflect the result of B's support mapping. The problem is that mirroring the result of the support mapping mirrors not only the shape of the object, but also the direction in which we're getting the most-extreme points. So, to compensate, we'll need to mirror our search direction as well.

S-B(v) = -SB(-v)

And now we have everything we need to construct a support mapping which will fully describe our green object:

SG(v) = SR(v) - SB(-v)

And that's it.

It's because of this representation that GJK will work with any pair of convex objects. Once you've got it working, there's no need to design, implement, and test a separate routine for each possible combination of object types you might want to intersect. You just need a support mapping for each individual convex type you'll use and a function that applies the above formula to compose them into the needed SG.

(It is worth mentioning, however, that you can often speed things up a lot by combining common pairs of support mappings into individual functions, as there will sometimes be some repeated work between them. Still, that's just an optimization, and not actually necessary.)

Searching for the Origin

And now we're at the core of the GJK intersection algorithm. This is the part where we search for the origin in our set G. Now, our search space is potentially huge, so we want an algorithm that's going to head straight for the origin, cutting the available search space down as quickly as possible. We also want something relatively simple, so that we don't go insane trying to write and debug the thing. So we're going to accomplish that by jumping around the figure in an orderly manner, trying to construct a simplex that contains the origin.

The way we're going to do this is we're going to pick points one at a time from the support mapping and add them to a set of points. We're then going to look at the figure described by that set (two points is a line, three is a triangle, four is a tetrahedron, so on if you're working in four or more dimensions) and figure out which of its aspects (vertices, edges, faces) the origin is closest to. We're then going to throw away all of the points that aren't part of that aspect and pick a sane search direction to find our next point, such that we converge on the origin. That sane direction has two defining properties - it's perpendicular to the part of the set we're keeping and it points towards the origin.

We stop when we manage to build a simplex (a triangle in 2D, tetrahedron in 3D, so on) that encompasses the origin or when we find that it's impossible to do so.

Nearest Regions

So how do we define our nearest regions? Given the sort of figures we're working with, these are pretty basic, and they're easily defined by a series of plane tests. If the figure is a line, for instance, space breaks down as follows:


We've got our two endpoints, A and B, and three possible regions, labelled 1, 2, and 3 (this is an edge-on view of 3D space - region 2 wraps all the way around the line). Our dividing planes are therefore the two planes perpendicular to the line segment which pass through the endpoints. Note that we don't even need the explicit plane formulas, either. It's enough to know which side of the planes we're on, which (if you work it all out) simplifies to a couple of vector subtractions and a dot product.

Triangles are a little more complicated, but not much:


Again, the regions closest to edge are defined like for the line segment above - planes perpendicular to the lines running through the endpoints. The center region is also bounded by a set of three planes (3D, remember?) which are perpendicular to the triangle's plane and run through its edges. Note that the central region (1) is split into two - the half of it "above" the triangle and the half "below".

I'm not going to try to draw a tetrahedron. It was bad enough working my way through all the cases the first time that I ended building a little model. There's a bit more detail on this case (and a picture of said model) on the implementation page.

Searching in Depth

Let's look at how this works in 3D:

To start, we have an empty set. We can't really compute anything with it, but we need a search direction to start, so let's make one up at random. We feed that direction into our support mapping (that's Sg - Sr and Sb aren't directly used here) to get our first point, which we add to the set.

We've now got a single point. There's really no such thing as being perpendicular to a point, but there is a direction to the origin. Our search direction becomes the vector from the point towards origin. Feeding that into the support mapping gets us another point, and adding it to the set creates a line segment.

OK, so we have a line segment now. This gives us three possible features that the origin could be nearest to. There are the two endpoints, and there's the line segment between them. If the origin is closest to either end point (it won't be - more on that later), then we keep just that point and our search direction is, again, the vector from it to the origin. If it's the line segment, then we keep both points, and our search direction is computed with a few cross products to be both perpendicular to the segment and point right at the origin. We call the support mapping again and get another point, forming a triangle.

If we've got three points in our set, then we've got a triangle. More features this time. Three vertices, three edges, and two center regions - above and below the plane of the triangle. If it's a vertex (it isn't, again that's an optimization for later), we take just that vertex and our search direction becomes the vector from it to the origin (are you seeing a pattern yet?). If it's an edge, we keep just that edge's vertices and do some cross products to get a vector perpendicular to it and towards the origin (definitely a pattern here). If it's the face, we keep all three vertices and our search direction is the triangle's face normal (negated if the origin is 'below' the triangle). We search for another point.

If we're up to four points, we've got a tetrahedron. The important features are the four vertices, six edges, four faces, and the interior volume. Let's just skip over the vertices, edges, and the regions outside the faces, as I'm sure you've spotted the pattern by now. If the origin isn't found in any of those regions, then it must be inside the tetrahedron. And the tetrahedron is inside the green object (all of these figures are inside it - it's convex and their end points are on its surface). So the origin is inside the green object and we're therefore done. Success! We've found it. The objects intersect.

Knowing When to Stop

But what if the objects don't intersect? How do we figure that out? That's rather easy. See all those spots where we get our next point from the support mapping? If that next point is ever closer to our existing figure than the origin itself is, then we know we can stop. The support mapping gives us the farthest point in a given direction, and our search direction is always pointed at the origin, so if we go as far as we can straight towards the origin and don't reach it, then we never will, and we can stop knowing that the objects do not, in fact, intersect.

Implementing the Algorithm

An important detail is that, since we're testing which side of the origin our new point is on as we go, and since we know certain things about the old points (we tested them as we got them, that information remains valid as we go along), we can optimize away a lot of the tests we'd otherwise have to do. That's why I kept noting that the region closest to the origin isn't ever going to be that of any of the vertices. Nor, in fact, will it ever be a region not immediately adjacent to the new vertex (those regions are behind the old vertices, and we already looked there).

There are also the standard issues of precision to be dealt with.

A detailed description of this, including some actual code, can be found on the intersection query implementation page.