Learning AI Poorly: Walking through a scene made generated from a bunch of photorgaphs (3D Gaussian Splatting)

(originally posted to LinkedIn)

I want to talk about something that isn’t really AI, but kind of is AI and then maybe ask what is AI? So, stick with me… A paper came out on August 8, 2023 called 3D Gaussian Splatting for Real-Time Radiance Field Rendering and it is fantastic. If you, IRL, walk around an area and take a ton of pictures, you can use the algorithm from the paper to build a 3D scene and walk around in that. In other words, you can build an entire 3D world from just some flat photographs. That’s been done before, but the neat part about this method is it’s fast enough to allow you to walk around in that scene in real-time and it looks really good.

Think about it, you could build a video game where you walk around the grand canyon just by taking some photographs… And, it will look real(ish.)

Seriously, you should check out the web page the authors put together. There are examples!

How Does It Work? #

Go somewhere and take a bunch of pictures from different angles.
Structure from Motion is a technique for estimating three-dimensional structures from two-dimensional image sequences. It is a solved problem… So, run a program using your pictures to get a 3D field of points… They won’t look good but you need them for step 3.
Remember my “Guess a Number” article? Do that for each point generated by step 2… but do it for color and shape. Each point is kind of a function… Use the pictures as “ground truth” to slowly adjust parameters of each function until they better match the real scene. This is Gradient Descent which is basically what you do when you train a neural network. When you are done you have a 3D point cloud made up of a bunch of splats of paint floating in space. And this looks good, trust me.
Now you have to render it… meaning, turn it into a picture to display on your screen. There might be millions of splats, but luckily your graphics card is really good at displaying crazy numbers of polygons like that so it isn’t really a problem. In fact, it is really fast!

There’s no neural network? #

That’s right… There is no neural network. So, is this AI? Doesn’t this still feel like AI somehow? All you had was some photographs and the computer was able to generate an entire world… and it looks real. I would argue that there is a training step that is exactly like AI. In typical AI you feed real-world data to a neural network and adjust the network until what comes out seems correct. In some ways, you could think of the 3D scene as the neural network… That 3D scene is slowly trained until its output (our perception of a real environment) is close to being “correct” or, “looking good” or whatever.

In 3D Gaussian Splatting, the scene is the neural network.

maybe?

sure, why the hell not.