Software Can Recreate 3D Spaces From Random Internet Photos

A new machine learning approach from Google researchers can turn people's tourist photos on the internet into incredibly detailed 3D scenes.
August 10, 2020, 2:09pm
Software Can Recreate 3D Spaces From Random Internet Photos
Screegrab: GitHub/NeRF-W

Google researchers have reconstructed incredibly detailed 3D scenes of famous landmarks around the world using photographs taken from the internet and machine learning. 

On the project’s GitHub page, researchers shared 3D scenes of the Brandenburg Gate in Berlin, Sacré-Cœur in Paris, and the Trevi Fountain in Rome—all created from photographs taken from online sites such as Flickr. The results are impressive 3D renderings in which the view from the camera can be moved and the appearance of the scene changed by different lighting effects.

The researchers recently shared their methods in a paper titled “NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections” on the arXiv preprint server. The method, which they’ve named NeRF-W, builds upon Neural Radiance Fields (NeRF), which can also be used to reconstruct 3D scenes from a collection of photographs, but may struggle if the photographs are taken outside a controlled setting, according to the paper.

“The Neural Radiance Fields (NeRF) approach implicitly models the radiance field and density of a scene within the weights of a neural network. Direct volume rendering is then used to synthesize new views, demonstrating a heretofore unprecedented level of fidelity on a range of challenging scenes,” the researchers write in their paper. 

“However, NeRF has only been demonstrated to work well in controlled settings: the scene is captured within a short time frame during which lighting effects remain constant, and all content in the scene is static,” the paper reads.

The researchers go on to explain that NeRF can struggle when using images that fall outside these parameters—for example, a collection of tourist photographs of Trevi Fountain taken by different people and posted to Flickr. This is because of variances caused by the light changing or things like image exposure or post processing. In addition, photographs sourced from the internet often contain moving objects like people or cars.

“Two photographers may stand in the same location and photograph the same landmark, but in the time between those two photographs the world can change significantly: cars and pedestrians may move, construction may begin or end, seasons and weather may change, and the sun may move through the sky,” the paper reads.

The resulting model may then contain ghosting, oversmoothing, and other artifacts. NeRF-W tackles this issue by introducing extensions that soften NeRF’s assumption that things in the world remain static, the authors explain in the paper, leading to a better result. 

The result of using NeRF-W is the type of realistic 3D reconstruction that is often needed for augmented reality and virtual reality applications, created using stuff from around the web.