When recompressing videos it's a good idea to get the video as right as possible. Since we're going to apply lossy compression to it we want to make sure that we end with as good a copy as possible.
Typically getting a video into something close to archivable form involves:
-
Trimming it by cutting away worthless bits. This is where you can make some good data savings - if you only have ten seconds of stuff happening in a one minute video, that's 83% data saved even before we get to the compression step.
-
Adjusting the exposure and color balance. When shooting video it's important to get it right in camera. While us still photographers have the luxury of 14-bit channels, a videographer will have to make do with ten bits, and if you're shooting on a cheap camera - eight bits. Before we let the lossy compression stomp all over it we try to get it as right as possible.
-
Applying some kind of log/gamma curve. For 8-bit footage this may not actually do anything, but if you want to preserve shadows that you couldn't lift in the exposure step, now's the time to get those values up so the lossy compressor doesn't eat them.
There are also a couple of other things that you can do:
-
Stabilize the video. Unless you think you'll have a need for the unstabilized video, you might as well stabilize it. Since this reduces the inter-frame differences it also helps with compression.
-
Apply any included stabilization metadata. As an example, 360-videos contain separate tracks for the orientation sensor. "Baking it in" the output gives you a nearly stabilized archived movie, and since that metadata typically won't be copied to the output makes the archive copy easier to work with.
Both of these also have the advantage of working on the best copy of the video as input, meaning that the quality loss will be as small as possible.
1. Processing Pipeline
Ideally we do all of this in one pass. Correcting exposure and white balance, and applying a tone curve, can actually be done in one step by creating a 3D LUT incorporating all those color transformations.
The stabilization requires two passes (one for analysis and one to apply the required transforms), but ignoring the analysis pass we can just tack on the application pass after the 3D LUT.
In the end we have a four stage pipeline:
-
Color
-
Black- and white point
-
Exposure
-
Shadows and highlights
-
Saturation
-
Contrast
-
Color balance
-
Log / dynamic range compression
-
-
Transforms
-
Metadata stabilization - use accelerometer data from the video file to stabilize
-
Image-based stabilization - analyze the video and stabilize it based on actual image data
-
Encode - encode it as efficiently as possible
2. Implementation
I ended up with two implementations - one using ffmpeg
and one using the MLT framework[a]. They are essentially the same as both use the avfilter[b] set of filters, but one is straight command line and the other requires an XML file with the filter chain definition.
2.1. 3D LUT
Both implementations depend on creating a .cube
LUT. This is quite simple to generate if you express your color transforms as a RGB-RGB function, because you then just sample the RGB space with a reasonable grid to keep the transform accurate while not having ridiculous file sizes. I chose 64x64x64 for a total of 262144 samples and 6-bit resolution. The format of a .cube
file[c] is simple:
LUT_3D_SIZE 64
<red> <green> <blue>
...262143 more rows...
The rows are ordered so that for an input of (R_(in), G_(in), B_(in)), with each component ranging from zero to LUT_3D_SIZE - 1
, the corresponding output values can be found on line N = B_(in) * LUT_3D_SIZE * LUT_3D_SIZE + G_(in) * LUT_3D_SIZE + R_(in). Which is a roundabout way to say that you should generate the file like this, while ensuring that LUT_3D_SIZE - 1
corresponds to the maximum value for a channel (255 for 8-bit color):
for (int b = 0; b < LUT_3D_SIZE; ++b) {
for (int g = 0; g < LUT_3D_SIZE; ++g) {
for (int r = 0; r < LUT_3D_SIZE; ++r) {
...
}
}
}
The output values range from 0.0 to 1.0. Note that avfilter doesn't support all features of a .cube
file. The above is the minimum workable implementation that I found.
2.2. ffmpeg
In ffmpeg we just set up a simple filter chain, which is a comma-separated list of filters with parameters:
ffmpeg -i <input> \
-vf <comma-separated list of filters> \
...output and encoding parameters...
2.2.1. Color
The color corrections and transformations are done with the 3D LUT:
lut3d=file=<path to .cube file>
2.2.2. Image-based Stabilization
Image-based stabilization uses the VidStab[d] library and requires an analysis pass over the file:
ffmpeg -i <input> \
-vf vidstabdetect=shakiness=<shakiness>:result=<stabilization file> \
-f null -
Then we add the following filter to the filter chain:
vidstabtransform=smoothing=<smoothing>:input=<stabilization file>
The smoothing parameter controls how many frames the movements will be smoothed over. As a special case, setting this to zero will result in the filter trying to keep the camera absolutely still. This is useful when you don't want the image to move at all.
2.2.3. Encoding
Encoding parameters are simple, with the quality parameter being between 17 (highest quality) and 28 (lowest quality)[e].
ffmpeg ...input and filter chain... \
-c:v libx264 \
-preset veryslow \
-crf <quality>
2.3. MLT Framework
The command-line tool for MLT uses an XML file to define the pipeline. The downside is that you have to create it, the upside is that you don't have to worry about how CLI arguments are parsed.
2.3.1. Creating the XML File
In order to get the in- and out frames set properly on the source, you need to first create a file with the filter chain.
<?xml version="1.0" standalone="no"?>
<mlt LC_NUMERIC="C" version="7.23.0" title="Shotcut version 24.02.29">
<chain id="chain0" title="Shotcut version 24.02.29">
<property name="resource">...input file...</property>
...filters...
</chain>
</mlt>
2.3.2. Color
<filter id="filter<id>">
<property name="mlt_service">avfilter.lut3d</property>
<property name="av.interp">trilinear</property>
<property name="av.file">...path to .cube file...</property>
</filter>
2.3.3. Metadata Stabilization
<filter id="filter<id>">
<property name="version">2.7</property>
<property name="mlt_service">frei0r.bigsh0t_zenith_correction</property>
<property name="shotcut:filter">bigsh0t_zenith_correction</property>
<property name="interpolation">1</property>
<property name="analysisFile">...input video file...</property>
<property name="enableSmoothYaw">...true or false...</property>
<property name="smoothYaw">...yaw smooth frames...</property>
<property name="timeBiasYaw">...smoothing time bias...</property>
<property name="clipOffset">0</property>
</filter>
2.3.4. Set Producer Parameters
Then, run the melt tool to get producer data:
melt -quiet -silent \
<input xml file> \
-consumer xml \
> <output xml file>
2.3.5. Consumer
Insert the consumer with encoding parameters, with crf again being between 17 (highest quality) and 28 (lowest quality)[f]:
<consumer
ab="384k"
acodec="aac"
ar="48000"
bf="3"
channels="2"
crf="<crf>"
deinterlacer="onefield"
f="<format>"
g="150"
mlt_service="avformat"
movflags="+faststart"
preset="slow"
real_time="0"
rescale="nearest"
target="<output>"
threads="0"
top_field_first="2"
vbr="off"
vcodec="libx264"
/>
2.3.6. Run the Pipeline
Finally, run the whole pipeline.
3. Conclusion
This way I've set up a number of pipelines that can re-compress "normal" video with and without stabilization, in various quality levels, and the same for 360-degree video. I usually get about 80% compression (files are reduced to one fifth the original size) with little to no perceptible drop in quality.
Links
https://www.mltframework.org/ | |
https://www.ffmpeg.org/libavfilter.html | |
https://resolve.cafe/developers/luts/ | |
https://github.com/georgmartius/vid.stab?tab=readme-ov-file#usage-instructions | |
https://trac.ffmpeg.org/wiki/Encode/H.264#a1.ChooseaCRFvalue | |
https://trac.ffmpeg.org/wiki/Encode/H.264#a1.ChooseaCRFvalue |