Nobigi - Neural optimized baked interactive global illumination

code Colab notebook newspaper Story lightbulb Demo

Single interactive component

Two interactive components

Described here is a way to achieve performant (even on mobile devices that are a few years old) and interactive global illumination using small neural networks (4-7 layers deep). Baked lightmaps are the method of choice for pleasantly and accurately lit 3D scenes on the web, but their main drawback is that they are static. A rarely used alternative are video lightmaps, which essentially allow for 1 axis of interactivity (time can be used to represent a dynamic component) - but they are rarely used with good reason, as they can be extremely heavy assets. Demonstrated here is an approach where the global illumination effect of 2 dynamic components (the example uses the rotation of a light source on two axes) can be "compressed" into neural networks with a couple of thousand parameters, and further represented in optimized GLSL (shader) code that is a few 10 kB in size.

In addition to the dynamic and asset weight advantages compared to alternative methods, this approach also benefits from practical resolution independence (the effect scales very well in terms of quality, though one must be mindful of performance) and practically infinite steps between two values (e.g. there are an infinite amount of steps between a 1-degree rotation and a 2-degree rotation), allowing for smooth effect behaviour and interactivity.

Step-by-step guide

Prepare a scene in Blender (see the Coffee Cup scene for an example)
Bake all the surfaces in the scene for all the variable conditions in your scene (such as light rotation, or object movement). See this Blender script for an example - this is also included in the example Blender scene linked above.
Prepare and train your neural network. See this annotated Colab notebook: Nobigi colab notebook.
Carry the generated GLSL code into your web 3D application, an example of which can be seen here: Coffee cup example sandbox.

Just a bit more background

Each surface in a scene becomes a trained neural network contained within a shader. Here's an example of the coffee cup's surface, unwrapped according to its UV map (you can hover over it with your mouse and explore its two interactive axes):

That means that each pixel, say roughly a million of them on an average screen, is doing inference on a multi-thousand parameter network, ideally 60 times per second. That's just a lot of compute and something to keep in mind for older devices (though for example the Coffee Cup demo does work fluidly on many devices).

Future

Although you need to be attentive when using this approach, in order to keep performance consistently within a good range, you can also look forward a bit and imagine, given the expected (and current) improvements in GPU capabilities, what else is possible when pixels are "smart" and their networks deeper and deeper. If you're in a position to fund research in neural graphics, do say hello.

Acknowledgements

I would like to thank Noah Witherspoon, Paul Henschel, 0beqz for feedback, discussions, and encouragement. This website is based on Instant-NGP's website.

Coffee Cup model by Andrey Kananav (model, CC0)