It's one thing to show a machine a picture of a dog and expect it to say it's a dog. And it's another thing entirely to show it a video and ask it what's going on.
In this video, artist Kyle McDonald, the same man who got into trouble with the Secret Service in 2012 for scanning the faces of Apple Store customers, runs an open-source neural networking program called NeuralTalk2, developed by Google and Stanford, out in the streets of Amsterdam on his laptop.
The program runs through the images and captions them based on recurring patterns that it has learned through training. Training works as it does with any artificial neural network: feed it a set of images as an input and have it associate certain patterns and sets, and name them as outputs. For NeuralTalk2, users can set training on for as long as days on end to improve accuracy.
While McDonald is walking, the program captions end up being wildly inaccurate. But a few times, it pretty much nails it:
The past few months, we've seen photos captioned through machine learning and machines generating photos based off of prompts. We've seen machines endlessly mesh images together into seemingly acid-induced dreamscapes. And we've seen that neural networks can and often will fail miserably when it comes to captioning things.
But if the not-entirely inaccurate captioning of this video is of any indication, it might not be long before we can adapt this sort of technology to make things like CCTV, which has copious amounts of footage, easier to look over.