Motion Interpolation

Motion Interpolation

Slow-motion, bullet time and all that is cool, but high speed cameras are expensive, so let's look at a way to slow down footage in post-processing.

Let's start off by showing the end result:

1. Interpolate Between Frames

The simplest way to slow down a movie clip is to insert interpolated frames between the existing ones. The upside is that this is very easy to implement, the downside is that it doesn't look very good. The algorithm is easy to understand and implement: Between each pair of frames F0 and F1, the first being at time t0, and the other at time t1, we insert a frame Fi given as:


ti = (t - t0) / (t1 - t0)


Fi,x,y = (1 - ti)F0,x,y + tiF1,x,y

Or, in graphics-speak, linear interpolation between each pair of frames.

Simple and ugly as it may be, it forms the basis for other techniques. Just about every method has at its core the idea of taking successive pairs of frames and interpolate features between them. This is just a special case where the features we consider are simply the pixel values.

2. Motion Interpolation

But what if we also considered the motion of objects in the scene? The movement of objects in a scene is called the "optical flow" of the scene, and is a vector field that for each visible point in the scene in the first frame tells us where the corresponding point is in the second frame.

The idea is then to not just interpolate the pixel values, but to gradually deform the image according to the optical flow. Let Vx,y be the optical flow field. For each pixel x,y it defines a vector with two elements, [x,y]. Then, for each pixel x,y in the resulting image, we look up the flow vector, Vx,y. Without too much loss of precision, we can say that this pixel "came from" a point that lies back along the vector Vx,y, and will "go to" a point along the forward direction of the same vector. Since Vx,y is the vector from pixel x,y in the first frame to the corresponding pixel in the second frame, we can easily find the "back coordinates", [xb,yb] and "forward coordinates", [xf,yf]. These are then used for interpolation.


ti = (t - t0) / (t1 - t0)


[xb,yb] = [x,y] - tiVx,y

Explanation: The point [x,y] in the interpolated frame came from a point in the first frame that lies along the line f(u) = [x,y] - uVx,y. As value of u we choose the variable ti, which goes from 0 to 1 as we go from the first frame to the second. The coordinate pair [xb,yb] can be interpreted as the point we arrive at when we move against the optical flow from [x,y] a distance proportional to the time that has passed since the first frame.


[xf,yf] = [x,y] + (1 - ti)Vx,y

Respectively, the point [xf,yf] is the point we arrive at if we move along the optical flow a distance proportional to the time left to the second frame.


Fi,x,y = (1 - ti)F0,xb,yb + tiF1,xf,yf

This algorithm works surprisingly well, and is reasonably easy to understand and implement. There is only one big white spot on the map that we haven't filled in: How to determine the optical flow.

3. Determining the Optical Flow

There are many ways to determine optical flow[a], but for the purposes of this article we'll do it the most naive way possible: Divide the first frame into square blocks of size s, and brute-force try to match it against the second frame at every location within a distance d in the second frame. We'll even skimp on computing d.

So, for each point [x,y], cut out a small square [x,y] - [x+s,y+s], and for each point in the region [x-d,y-d] - [x+d,y+d] add the sum-of-squares of the difference between the small square and the second frame.

This is something that makes more sense as pseudocode:

block = crop (firstFrame, x, y, x+s, y+s);
bestSum = MAX_FLOAT;
bestPosition = [x,y];
for (int dx = -d; dx < d; ++dx) {
    for (int dy = -d; dy < d; ++dy) {
        correspondingBlock = crop (secondFrame, x+dx, y+dy, x+s+dx, y+s+dy);
        difference = subtract (block, correspondingBlock);
        sum = 0
        for each pixel p in difference:
            sum = sum + sqr (p)
        if (sum < bestSum) 
            bestSum = sum
            bestPosition = [x+dx,y+dy]
opticalFlow (x,y) = bestPosition - [x,y]

Some comments and notes:

  • This method is slow, but unlike a video encoder, we must find the flow vector if it exists. Therefore, reducing the search space is not advisable.

  • One can speed up processing by only sampling, say, a grid of pixels and then interpolate between them. The risk is that movement of objects smaller than the grid size risk being lost.

4. Other Examples