Here's a scenario that's becoming increasingly common: you see that a friend has shared a video of a celebrity doing or saying something on social media. You watch it, because you're only human, and something about it strikes you as deeply odd. Not only is Jon Snow from Game of Thrones apologizing for the writing on the show's last season, but the way his mouth is moving just looks off.
This is a deepfake, an AI-generated dupe designed to deceive or entertain. Now, researchers have trained AI to look for visual inconsistencies similar to humans in order to detect AI-generated fake videos.
It's become relatively easy for amateurs to create these videos with the spread of open source AI models and datasets online, and researchers are working on ways to automatically detect them. One way that humans detect deepfakes is by identifying the way that something moves—say, a person's mouth—as being odd and uncomfortably inhuman. We might call this entering the uncanny valley.
According to recent research from computer scientists at the University of Southern California's Information Sciences Institute, popular AI models for generating deepfakes (and other approaches, such as 2016's graphical Face2Face program) manipulate video on a frame-by-frame basis and don't enforce temporal coherence. This can make the movement in the resulting videos look pretty janky, which humans often pick up on.
To automate the process, the researchers first fed a neural network—the type of AI program at the root of deepfakes—tons of videos of a person so it could "learn" important features about how a human's face moves while speaking. Then, the researchers fed stacked frames from faked videos to an AI model using these parameters to detect inconsistencies over time. According to the paper, this approach identified deepfakes with more than 90 percent accuracy.
Study co-author Wael Abd-Almageed says this model could be used by a social network to identify deepfakes at scale, since it doesn't depend on "learning" the key features of a specific individual but rather the qualities of motion in general.
"Our model is general for any person since we are not focusing on the identity of the person, but rather the consistency of facial motion," Abd-Almageed said in an email.
"Social networks do not have to train new models since we will release our own model. What social networks could do is just include the detection software in their platforms to examine videos being uploaded to the platforms."
While there are many approaches to detecting deepfakes in development—like generating noisy watermarks in photos when they're taken—our future may very well include AIs duking it out over our perception of reality.
With reporting by Samantha Cole.
Listen to CYBER, Motherboard’s new weekly podcast about hacking and cybersecurity.