Ricoh Theta Video Metadata
 

Ricoh Theta Video Metadata

The Ricoh Theta SC and Ricoh Theta SC2 stores orientation information in the video files they produce. Extracting this information is helpful when editing, since it lets you do a first-pass zenith correction and get the footage somewhat stable.

1. Overview

The interesting metadata exists in the user data atom of the moov atom. Of the atoms I found, I could only understand two of them well enough to put the data to any use: the RDTH and RDT5 atoms. For completeness I've included the other atoms I ran across, but as you'll see, most of the data is simply marked as "unknown".

2. Navigating the MP4 File

An MP4 file consists of atoms, each atom being a section of the file with a four byte type identifier. Atoms can contain other atoms, so when I write moov.udta I mean the atom named udta in the top-level moov atom.

In the type specifications below I'll use the conventional [u]intX convention to specify unsigned (uintX) or signed (intX) integer values with X bits. Since Ricoh uses both types, I'll also add be or le for big-endian and little-endian layouts. A uint32le is therefore an unsigned 32-bit integer stored in little-endian format.

The atom header is a simple structure, starting with a 32-bit size field that denotes the number of bytes in the atom, including the header. Then follows the atom name as another 32-bit quantity. If the size field equals 1, then the actual size of the atom follows the name field as a 64-bit unsigned integer. If the size is zero, then the atom extends to the end of the file. If it has any other value, the largeSize field is absent. Then, if the name field is equal to "uuid" or 0x75756964, there is a 16-byte user type field. Otherwise this field is absent.

struct MP4Atom {
    uint32be size;
    uint32be name;
    
    if (size == 1) {
        uint64be largeSize;
    }
    
    if (name == "uuid") {
        uint8 usertype[16];
    }
}

To navigate the MP4 file you start by reading the first atom. If it is the one you were looking for you can treat the bytes in the atom as another sequence of atoms. If not, you should skip ahead size - header size bytes to the next atom.

3. Metadata Atoms

All atoms are in the user data (udta) atom of the (moov) atom.

3.1. Theta SC and Theta SC2

The SC and SC2 shares some atoms, but not all. The Theta SC stores the orientation information in a RDT5 atom as a gravity vector, and the SC2 in a RDTH atom as a quaternion specifying the camera's orientation.

3.2. moov.udta.RDTH

The RDTH atom can be found in the moov.udta atom and stores the camera orientation as a quaternion.

struct RDTH {
    /**
     * Number of entries. Corresponds to
     * the number of frames in the video.
     */
    uint32le entries;
    
    uint16le unknown1; // was 30
    uint16le unknown2; // was 24
            
    Orientation orientations[entries];
}

struct Orientation {
    /**
     * A monotonically increasing
     * sequence. Some kind of timer?
     */
    uint32le unknown1;
    
    /**
     * Always zero.
     */
    uint32le unknown1;
               
    // The following fields make up a quaternion
    
    /**
     * real (scalar) part, rotation amount
     */
    float32le r; 
    
    /**
     * i, pitch axis
     */
    float32le i;
    
    /**
     * j, roll axis
     */
    float32le j;
    
    /**
     * k, yaw axis
     */
    float32le k;
}

3.2.1. The float32le Type

This is a 32-bit IEEE 754 single-precision floating-point number stored in little-endian format. If you read the integer correctly, you can then convert it to a float using Java's Float.intBitsToFloat(int i) or, in C++ by using a union type to reinterpret the bits:

float readFloat32LE() {
    union {
        /** assuming 32-bit IEEE 754 single-precision */
        float    f; 
        
        /** assuming 32-bit 2's complement int */
        uint32_t i;
    } u;

    u.i = readUInt32LE(); // Read a 32-bit unsigned int in little-endian
    return u.f;
}

3.2.2. Example Data

Orientation data for a movie where the camera is rotated 360 degrees around an axis going from back to front. (A barrel roll.)

3.3. moov.udta.RDT5

The RDT5 atom can be found in the moov.udta atom and stores the camera orientation as a gravity vector (and some more information whose meaning is unknown to me).

For some reasons the number of entries are exactly twice the number of video frames in the file. I don't know why.

struct RDT5 {
    /**
     * Number of gravity vector entries.
     */
    uint32be entries;
                
    uint8 unknown[40];
    
    GravityVector gravityVectors[entries];
}

struct GravityVector {
    /**
     * x- left, x+ right
     */
    int16be x;
    
    /**
     * y- down, y+ up
     */
    int16be y;
    
    /**
     * z- back, z+ front
     */
    int16be z;
    
    uint8 unknown[6];
}

The vector components are stored as signed 16-bit integers. To normalize the vector, simply do a float divide by 16384.

3.3.1. Example Data

Orientation data for a movie where the camera is rotated 360 degrees around an axis going from back to front. (A barrel roll.)

As you can see, the data is quite noisy. Smoothing it using a size 16 box blur on the raw data worked well for me.

3.4. moov.udta.RDTD

Appears to be a sequence of 3-vectors encoded as three 16-bit values that correspond to camera orientation. However, all vector components are always positive. The layout is as follows:

struct RDTD {
    uint32le entries;
    uint16le unknown1;
    uint16le unknown2;
                 
    RDTDEntry rdtdEntries[entries];
}

struct RDTDEntry {
    /**
     * A monotonically increasing
     * sequence. Some kind of timer?
     */
    uint32le unknown1;
    
    /**
     * Always zero.
     */
    uint32le unknown2;
    
    /**
     * Magnitude of x-component of 
     * gravity vector?
     */
    uint16le x;
    
    /**
     * Magnitude of y-component of
     * gravity vector?
     */
    uint16le y;
    
    /**
     * Magnitude of z-component of
     * gravity vector?
     */
    uint16le z;
    
    /**
     * Zero unless at the end of data
     * when it is 65535. Indicates
     * final frame?
     */
    uint16le unknown3;
}

3.4.1. Example Data

Orientation data for a movie where the camera is rotated 360 degrees around an axis going from back to front. (A barrel roll.) This is from the same movie as the example data for RDTH.

3.5. moov.udta.RDTG

A monotonically increasing sequence.

struct RDTG {
    uint32le entries;
    uint16le unknown1;
    uint16le unknown2;
                 
    RDTGEntry rdtgEntries[entries];
}

struct RDTGEntry {
    /**
     * A monotonically increasing
     * sequence. Some kind of timer?
     */
    uint32le unknown1;
    
    /**
     * Always zero.
     */
    uint32le unknown2;
}

3.6. moov.udta.@mod

Camera model. Example data: RICOH THETA SC

3.7. moov.udta.@swr

Software revision. Example data: RICOH THETA SC Ver 1.01

3.8. moov.udta.@day

Time of capture. Example data: 2021-02-25T19:40:20+02:00

3.9. moov.udta.@mak

Camera make. Example data: RICOH

3.10. moov.udta.@xyz

GPS data. Example data: +59.384407+017.968946+60CRSWGS_84/