Check out this fucked-up cat. It was generated in a relatively small amount of time—that is, at web app timescales—from my own horrible trackpad doodle of a cat by a variation of a technique that's now at the forefront of machine learning generation. You can play around with the demo yourself here.
So, that's fun, yeah? Everyone loves a good machine learning mutant. But what's actually happening here? Let's start with some requisite machine learning hypechecking.
Generating original images with artificial intelligence is hard—real hard. As a problem, it's a great example of where we actually are in machine learning, which is nowhere near what the hype would have us to believe. Machine learning is slow at simple visual recognition tasks, but when it comes to actually generating new visuals, it's barely off the starting line.
A promising approach to the problem lies in what are known as adversarial neural networks. The very basic idea is of taking a trained model—a mathematical representation of, say, a car or human face developed over the course of analyzing tens of thousands of images of those things—and making it face off against a generator that initially just spits out random noise.
Draw some randomized blobs enough times, and, eventually, you are going to hit on a similarity to a non-random image, such as a cat or human face. When this happens, the part of the neural network with the pretrained model will give the generator a small thumbs up.
If there is no similarity, the response will be a thumbs down. The relationship is "adversarial": The component that knows what a human face or cat looks like is always accepting or rejecting the guesses made by the other component. The whole process is guided by what's known as a loss function, the job of which is to classify generated images according to their "realness."
It's possible to make a new, original image from just a model using generative adversarial networks. We actually created this month's VICE magazine cover using the technique (the first-ever AI-generated magazine cover?). But it takes a lot of training images to really work properly, and, even then, the results are pretty rough.
Here, we're not starting from a blank page. The method at work is known as image-to-image translation, defined in a paper published last fall as "the problem of translating one possible representation of a scene into another, given sufficient training data." The loss function in this case determines what's known as a "structured loss." It's the difference between using wholesale randomness to predict a visual representation of something versus using randomness within the boundaries of a known structure. The latter version is a whole lot easier.
When we draw a cat in the pix2pix demo, the algorithm can just assume we drew a cat, or what's meant to be a cat. It won't have much luck filling in other doodles because it just has a cat model (the demo also supports purses, shoes, and building facades, but not all at once). It detects the edges in your drawing and then uses that as a structured starting point for the randomized duals of adversarial image generation.
Meows may vary.