Bigshot is moving toward the next release, which might be 1.2 or 2.0, depending on whether I modify the external interface or not. Before I go back to pushing that one forward, however, I'd like to share some lessons learned when optimizing Bigshot for speed.
1. The Old Techniques Work
Even though JavaScript is the newest and hottest thing right now, I can report that the usual, tried-and-true techniques will take you very far:
-
Pick good algorithms: Analyze the complexity of the algorithms you are using, and choose one that is appropriate for the problem domain. Going from O(n2) to O(n) means you go twice as fast even for n = 2.
-
Get things working, then correct, then fast: There is no need to sweat the small stuff until you know that things really work.
-
Avoid micro-optimizations: We're going to go over some things that may look like micro-optimizations, but in general I'd recommend against them. Beyond the use of object literals, I can't think of a single micro-optimization that makes from a performance / cost perspective.
-
Use a profiler: All WebKit-based browsers (Chrome and Safari) have built-in CPU and heap profilers. They will let you quickly narrow down the scope to the parts that actually take time, as opposed to the parts that you think take time.
-
The fastest code is the code that doesn't run: Clipping the rendering against the viewport, performing calculation once and re-using the value, avoiding buffer allocations... All of these are just as useful as ever.
1.1. Using The Profiler
The profiler in Chrome and Safari can be accessed via the console
object. One can therefore write benchmarks like this:
/**
* Runs a benchmark.
*
* @param {boolean} profile flag indicating
* whether the benchmark should be profiled
*/
function benchmark (profile) {
if (profile && console.profile) {
console.profile ();
}
// Benchmark code goes here
if (profile && console.profileEnd) {
console.profileEnd ();
}
}
This makes it possible to write a one-click benchmark-and-profile test.
2. Object Literals are Fast
Bigshot does a whole lot of linear algebra, as do most 3d graphics programs. This means that ther are a lot of 3d vectors being passed around and used in computations. The question then, was which representation was the most efficient?
Briefly, it boiled down to four choices: Create a constructor function, use an array, use an object literal, or create an object literal via a helper function. I wanted to test two aspects of these - creating an object, and accessing the fields in it. The benchmark I settled on was to create two 3d-vectors and compute the dot product of them.
2.1. The Test Cases
These test cases can be seen and run at jsperf.com[a]. I'm including my results below.
2.1.1. Constructor
Here we use standard JavaScript object-oriented programming. We create a constructor function that sets the field values:
function Point3D (x, y, z) {
this.x = x;
this.y = y;
this.z = z;
}
The benchmark is also straightforward. Two vectors are created and the dot product is computed.
var p = new Point3D (0, 0.5, 0);
var q = new Point3D (0, 0, 0.5);
var n = p.x * q.x + p.y * q.y + p.z * q.z;
2.1.2. Literal
This benchmark is more or less self-explanatory.
var p = { x : 0, y : 0.5, z : 0 };
var q = { x : 0, y : 0, z : 0.5 };
var n = p.x * q.x + p.y * q.y + p.z * q.z;
2.1.3. Array
Instead of creating an object, we can use a JavaScript Array
to represent the vector. The advantage is that we'll be able to scale up (or down) to any number of dimensions just by using different array sizes. For a general linear algebra package, this would be very useful, and indeed, this is the representation Sylvester[b] uses.
var p = [0, 0.5, 0];
var q = [0, 0, 0.5];
var n = p[0] * q[0] + p[1] * q[1] + p[2] * q[2];
2.1.4. Literal Factory
The literal factory uses an object literal, but wraps the creation of the literal in a function. This would give us the opportunity to add parameter validation code and ensure that the fields in the object literal are named correctly.
function MakePoint3D (x, y, z) {
return { x : x, y : y, z : z };
}
The benchmark code looks very similar to the §2.1.1. Constructor test case.
var p = MakePoint3D (0, 0.5, 0);
var q = MakePoint3D (0, 0, 0.5);
var n = p.x * q.x + p.y * q.y + p.z * q.z;
2.2. The Results
The tests were run on Chrome 23.0.1262.0 32-bit, Safari 5.1.7 32-bit and Firefox 15.0.1 32-bit; all on Vista 64-bit. The results were easy to interpret.
Chrome | Safari | Firefox | |
---|---|---|---|
Constructor | 6,197,533 | 4,507,434 | 3,132,604 |
Literal | 42,899,804 | 6,467,007 | 15,445,684 |
Array | 21,628,754 | 3,870,932 | 9,183,032 |
Factory | 23,549,392 | 5,468,152 | 14,511,107 |
The thing that stands out here is just how incredibly fast Google's V8 JavaScript engine[c] is. While Safari reaches 6 million iterations per second on the fastest test case, V8 reaches 6 million iterations on the slowest test case and seven times that on the fastest.
The conclusion we can draw - besides Chrome being fast - is that the object literal is the clear winner. Only on Firefox is there even a contest between two options.
3. Avoid Touching the DOM
DOM manipulations are very expensive and should be avoided if possible. In Bigshot's TileLayer
, image tiles can enter and leave the viewport, making it necessary to either toggle the visibility
property. Accessing this property is expensive, though, so the visiblity flag is cached in a bigshotData
property:
var tile = ... obtain HTMLImageElement ...
// JavaScript lets you add properties to
// any object. (With some exceptions.)
tile.bigshotData = {
/**
* Cached visibility state.
* We start visible.
*/
visible : true
};
Then the visibility of the HTML element is only updated when needed when clipping against the viewport:
var visible = clip (tile, viewport);
if (visible) {
// Only touch the visibility state
// if we must.
if (!tile.bigshotData.visible) {
tile.bigshotData.visible = true;
tile.style.visibility = "";
} else {
// Nothing to do here
}
} else {
// Only touch the visibility state
// if we must.
if (tile.bigshotData.visible) {
tile.bigshotData.visible = false;
tile.style.visibility = "hidden";
} else {
// Nothing to do here
}
}
The CSS3DVRRenderer
implementation does the a similar thing every render pass. At the start of the render pass, a flag is set on each tile element. If the element is found to be visible, the flag is cleared. At the end of the render pass, all elements with the flag still set are removed. This minimizes the amount of DOM manipulation.
4. WebGL
Three optmizations were instrumental in getting much higher WebGL performance.
4.1. Cache Buffers
Buffer objects are expensive to create. For Bigshot, where each quad has a unique texture, the same index, texture coordinate, and vertex buffer can be used for all rendering. The last optimization I did was to allocate these buffers on startup and then hang on to them until the VRPanorama
is disposed. Just the last optimization nearly doubled the performance: Up to 2300 fps from 1300 fps.
4.1.1. That's Wasteful
So you're running on a WebGL implementation that is starved for buffer space, and the four vertices that the WebGLVRRenderer
grabs are needed. The solution came in the form of the TimedWeakReference[d]
, which will dispose the buffers when not used and re-create them on demand.
4.2. Cache Textures
Textures are slow to create. 'nuff said.
4.3. Setting Shader Program Values
Due to an oversight on my part, the parameters for the vertex shaders were set once per quad instead of once per scene. I was shocked at first to see the method setMatrixUniforms
pop up in the profiler report. Moving it to the start of the scene's render
method took care of that, and I got a couple hundred more frames per second from the code.