Researchers at the Samsung AI Center in Moscow developed a way to create "living portraits" from a very small dataset&#x2014;as few as one photograph, in some of their models.

The paper, "[Few-Shot Adversarial Learning of Realistic Neural Talking Head Models](https://arxiv.org/abs/1905.08233)," was published on the preprint server arXiv on Monday.

The researchers call this few- and one-shot learning, where the model can be trained using just one image to create a convincing, animated portrait. With a few more shots&#x2014;as few as eight or 32 photographs&#x2014;the realism improves even more.

Statement regarding the purpose and effect of the technology
(NB: this statement reflects personal opinions of the authors and not of their organizations)

We believe that telepresence technologies in AR, VR and other media are to transform the world in the not-so-distant future. Shifting a part of human life-like communication to the virtual and augmented worlds will have several positive effects. It will lead to a reduction in long-distance travel and short-distance commute. It will democratize education, and improve the quality of life for people with disabilities. It will distribute jobs more fairly and uniformly around the World. It will better connect relatives and friends separated by distance. To achieve all these effects, we need to make human communication in AR and VR as realistic and compelling as possible, and the creation of photorealistic avatars is one (small) step towards this future. In other words, in future telepresence systems, people will need to be represented by the realistic semblances of themselves, and creating such avatars should be easy for the users. This application and scientific curiosity is what drives the research in our group, including the project presented in this video.

We realize that our technology can have a negative use for the so-called “deepfake” videos. However, it is important to realize, that Hollywood has been making fake videos (aka “special effects”) for a century, and deep networks with similar capabilities have been available for the past several years (see links in the paper). Our work (and quite a few parallel works) will lead to the democratization of the certain special effects technologies. And the democratization of the technologies has always had negative effects. Democratizing sound editing tools lead to the rise of pranksters and fake audios, democratizing video recording lead to the appearance of footage taken without consent. In each of the past cases, the net effect of democratization on the World has been positive, and mechanisms for stemming the negative effects have been developed. We believe that the case of neural avatar technology will be no different. Our belief is supported by the ongoing development of tools for fake video detection and face spoof detection alongside with the ongoing shift for privacy and data security in major IT companies. 

Authors:
Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, Victor Lempitsky

Paper:
https://arxiv.org/abs/1905.08233v1

Music:
"Fresh Fallen Snow" by Chris Haugen

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

Because they only need one source image, the researchers were able to animate paintings and famous portraits, with eerie results. Fyodor Dostoevsky&#x2014;who died well before motion picture cameras became commercially available&#x2014;moves and talks in black and white. The Mona Lisa silently moves her mouth and eyes, a slight smile on her face. Salvador Dali rants on, mustache twitching.

These "photorealistic talking head models" are created using convolutional neural networks: They trained the algorithm on a large dataset of talking head videos with a wide variety of appearances. In this case, they used the publicly available [VoxCeleb](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/) databases containing more than 7,000 images of celebrities from YouTube videos.

This trains the program to identify what they call "landmark" features of the faces: eyes, mouth shapes, the length and shape of a nose bridge.

This, in a way, is a leap beyond what even [deepfakes](https://www.vice.com/en_us/article/gydydm/gal-gadot-fake-ai-porn) and other algorithms using generative adversarial networks can accomplish. Instead of teaching the algorithm to paste one face onto another using a catalogue of expressions from one person, they use the facial features that are common across most humans to then puppeteer a new face.

**_Read more:_ [There Is No Tech Solution to Deepfakes](https://www.vice.com/en_us/article/594qx5/there-is-no-tech-solution-to-deepfakes)**

The researchers write in the paper that they recognize the applications for realistic face-avatars in video conferencing, gaming, and special effects&#x2014;but the uncanny valley often holds us back from fully embracing widespread use of face-avatars of real people. They hope that this work changes that, with its low source requirements and "perfect" realism.

It's Getting Way Too Easy to Create Fake Videos of People's Faces

ONE EMAIL. ONE STORY. EVERY WEEK. SIGN UP FOR THE VICE NEWSLETTER.