Image generated by u/chaindrop on Reddit
In recent months, advancements in AI-generated media are everywhere: generated “photos” of historical events that never happened, voices that mimic humans closely enough to break into a bank, and films made using AI instead of animators. But out of all of the creepy, overhyped, or doomsday-ushering examples of AI in the wild, this text-to-video monstrosity of Will Smith eating spaghetti is the most freakish by far. A distorted Smith that looks more like his fish character in Shark Tale than himself attempts to scoop piles of noodles into his mouth, or bite giant chunks of pasta off forks or out of his hands. It’s a nightmare stop-motion video, generated from just one innocuous line of text: “Will Smith eating spaghetti.”
Originally posted to Reddit by a user who goes by chaindrop, it’s made using a new Modelscope Text2Video generator, a machine learning model that turns text inputs into short video clips. In this case, chaindrop generated multiple versions of Smith’s imaginary pasta adventure by giving the model the prompt “Will Smith eating spaghetti,” then edited the short clip results together for an Italian dinner montage from hell.
There’s a demo version of the Modelscope Text2Video tool on Hugging Face that generates much shorter (one or two seconds long) videos, but the full model is available in a Github repository and on Modelscope.The Modelscope text-to-video tool was just made public in the last week, and people are already generating their own freaky little snippets, like dancing skeletons, cranes running in the void, and eerie silent films. Right now, the results are mostly terrible quality, with Shutterstock watermarks across the frames—the training datasets (includes LAION5B, ImageNet, Webvid and “other public datasets,” according to the developers) are filled with images scraped from the web, including preview images from stock photo sites. But like every kind of AI generated media, text-to-video technology will likely get increasingly realistic very soon.It’s one of several new text-to-video models that launched just in the last few weeks; machine learning tools company Runway launched a text-to-video generator earlier this month.
In the Reddit post comments, someone else attempted to generate a new version with the word “meatballs” inserted, and the results are even more nightmarish than the original. But in a time when people are falling for AI-generated dripped-out Popes and obviously fake Trump arrest images, it’s almost nice to see the machine learning community return to its roots with a budding technology: glitched-out distortions that will haunt our waking dreams forever.