Facebook just announced that it's launching a public race to develop technology for detecting deepfakes. The challenge, called the Deepfake Detection Challenge (DFDC), will have a leaderboard and prizes.
As part of the challenge, the company says it will release a dataset of faces and videos. The dataset will be built up by commissioning paid, consenting actors, and the company promises not to use any Facebook user data. The challenge’s launch and dataset release are both set to happen in December, to be launched at the Conference on Neural Information Processing Systems in Vancouver, Canada.
In a blog post on Thursday, Facebook's Chief Technology Officer Mike Schroepfer wrote that the goal of the challenge "is to produce technology that everyone can use to better detect when AI has been used to alter a video in order to mislead the viewer."
In total, Facebook is dedicating $10 million to the program. Grants and awards will be given out "to spur the industry to create new ways of detecting and preventing media manipulated via AI from being used to mislead others," Schroepfer wrote.
According to the DFDC website, the challenge will run through 2020. A winner will be selected using "a test mechanism that enables teams to score the effectiveness of their models, against one or more black box test sets from our founding partners." Along with Facebook, those partners include the Partnership on AI, Microsoft, and academics from Cornell Tech, MIT, University of Oxford, UC Berkeley, University of Maryland, College Park, and University at Albany-SUNY.
The DFDC isn’t Facebook’s first encounter with fake, algorithmically-generated videos. In May, artist Bill Posters created a deepfake of Mark Zuckerberg, in response to Facebook's policies on manipulated images and data usage. That deepfake—along with a slowed-down video of Nancy Pelosi made to make her seem drunk—forced Facebook to take a stronger stance against manipulated imagery on its platform.
Facebook making its own dataset with paid, consenting adults is a statement in itself, considering the company's history of misusing its users' data. A database of faces obtained with explicit consent is also meaningful for the machine learning community as a whole.
Massive datasets built using faces from around the internet are used all the time in research and for training AI models—in June, Microsoft deleted its MS-Celeb-1M dataset that consisted of around 10 million photos from 100,000 individuals collected from the internet, many of which were in the dataset without those people's explicit consent. In February, Chinese facial recognition company SenseNets left millions of records of people's locations exposed, illustrating how having an image in these datasets can impact individuals’ privacy.
Here, it’s worth noting that as part of the challenge announcement, Facebook talked to seven AI researchers and technologists about the benefits of detecting deepfakes. All seven of these people were men, despite the real-world harms of deepfakes being leveled against women.
There is still no technological solution to deepfakes.