Researchers at the Samsung AI Center in Moscow developed a way to create "living portraits" from a very small dataset—as few as one photograph, in some of their models.
The paper, "Few-Shot Adversarial Learning of Realistic Neural Talking Head Models," was published on the preprint server arXiv on Monday.
The researchers call this few- and one-shot learning, where the model can be trained using just one image to create a convincing, animated portrait. With a few more shots—as few as eight or 32 photographs—the realism improves even more.
Because they only need one source image, the researchers were able to animate paintings and famous portraits, with eerie results. Fyodor Dostoevsky—who died well before motion picture cameras became commercially available—moves and talks in black and white. The Mona Lisa silently moves her mouth and eyes, a slight smile on her face. Salvador Dali rants on, mustache twitching.
These "photorealistic talking head models" are created using convolutional neural networks: They trained the algorithm on a large dataset of talking head videos with a wide variety of appearances. In this case, they used the publicly available VoxCeleb databases containing more than 7,000 images of celebrities from YouTube videos.
This trains the program to identify what they call "landmark" features of the faces: eyes, mouth shapes, the length and shape of a nose bridge.
This, in a way, is a leap beyond what even deepfakes and other algorithms using generative adversarial networks can accomplish. Instead of teaching the algorithm to paste one face onto another using a catalogue of expressions from one person, they use the facial features that are common across most humans to then puppeteer a new face.
Read more: There Is No Tech Solution to Deepfakes
The researchers write in the paper that they recognize the applications for realistic face-avatars in video conferencing, gaming, and special effects—but the uncanny valley often holds us back from fully embracing widespread use of face-avatars of real people. They hope that this work changes that, with its low source requirements and "perfect" realism.