This article originally appeared on VICE US.
So a newish AI called AttnGAN makes me a very happy human. It’s a machine learning algorithm that was trained to produce images based on text input. The algorithm, a Generative Adversarial Network (GAN), was published in January by researchers at Microsoft’s Deep Learning Technology Center. Their work was also detailed in a paper posted to arXiv.org.
AttnGAN is supposed to visualize text-based captions, but it’s not very good at it—at times, horrifyingly so. To be fair, when researches trained the AI on a specific dataset, like images of birds, it was able to produce convincing renderings of birds. But when trained on a larger dataset of more diverse images, AttnGAN became artistically overwhelmed.
According to the company’s blog post:
Microsoft’s drawing bot was trained on datasets that contain paired images and captions, which allow the models to learn how to match words to the visual representation of those words. The GAN, for example, learns to generate an image of a bird when a caption says bird and, likewise, learns what a picture of a bird should look like.
The AI does okay with simple captions like “a cat.” But “the quality stagnates with more complex text descriptions such as a bird with a green crown, yellow wings and a red belly,” the researchers noted.
You can play around with AttnGAN thanks to a demo created by Cristóbal Valenzuela, a technologist and research resident at New York University. It’s part of a larger project, Runway, that enables AI to be used creatively. Valenzuela is also working on Marrow—an interactive web documentary that explores how AI might resemble our minds.
“The reason I'm building this is because I believe AI has a creative potential we aren't really exploring,” Valenzuela told me over Twitter DM.
The demo is pretty slammed right now, since everyone's creating their own compositions. If you want to see some truly Cubist works of artificial intelligence, I recommend this blog post from research scientist in optics, Janelle Shane.
“Besides some images being weird (if you type anything that has to do with humans),” Valenzuela said, “some people have been typing poetry, lyrics, books, quotes and getting more inspiring/poetic results.”
In addition to being a fun distraction, Valenzuela believes that AI can also be a practical tool. For example, Valenzuela added, this experimental project in creating synthetic characters for TV, movies, and animation.
As for why humans enjoy faffing around with AttnGAN and other AI, “[it] has a generative capacity that we as humans just enjoy watching,” Valenzuela told me.
“I guess this has to do with the fascination of having something not made of flesh that is able to understand the world and create meaningful content (at least for us).”