Onderzoekers aan het Samsung AI Center in Moskou hebben een manier ontwikkeld om &#x2018;levende portretten&#x2019; te maken met een zeer beperkte dataset &#x2013; soms zelfs maar één afbeelding als bronbestand.

Het onderzoeksrapport [_Few-Shot Adversarial Learning of Realistic Neural Talking Head Models_](https://arxiv.org/abs/1905.08233) werd afgelopen maandag gepubliceerd.

De onderzoekers noemen de techniek &#x2018;few- and one shot learning&#x2019;. Er is maar een afbeelding nodig om een overtuigend, bewegend portret te maken. Met een paar foto&#x2019;s extra &#x2013; 8 tot 32 afbeeldingen &#x2013; neemt het realisme alleen maar meer toe.

Statement regarding the purpose and effect of the technology
(NB: this statement reflects personal opinions of the authors and not of their organizations)

We believe that telepresence technologies in AR, VR and other media are to transform the world in the not-so-distant future. Shifting a part of human life-like communication to the virtual and augmented worlds will have several positive effects. It will lead to a reduction in long-distance travel and short-distance commute. It will democratize education, and improve the quality of life for people with disabilities. It will distribute jobs more fairly and uniformly around the World. It will better connect relatives and friends separated by distance. To achieve all these effects, we need to make human communication in AR and VR as realistic and compelling as possible, and the creation of photorealistic avatars is one (small) step towards this future. In other words, in future telepresence systems, people will need to be represented by the realistic semblances of themselves, and creating such avatars should be easy for the users. This application and scientific curiosity is what drives the research in our group, including the project presented in this video.

We realize that our technology can have a negative use for the so-called “deepfake” videos. However, it is important to realize, that Hollywood has been making fake videos (aka “special effects”) for a century, and deep networks with similar capabilities have been available for the past several years (see links in the paper). Our work (and quite a few parallel works) will lead to the democratization of the certain special effects technologies. And the democratization of the technologies has always had negative effects. Democratizing sound editing tools lead to the rise of pranksters and fake audios, democratizing video recording lead to the appearance of footage taken without consent. In each of the past cases, the net effect of democratization on the World has been positive, and mechanisms for stemming the negative effects have been developed. We believe that the case of neural avatar technology will be no different. Our belief is supported by the ongoing development of tools for fake video detection and face spoof detection alongside with the ongoing shift for privacy and data security in major IT companies. 

Authors:
Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, Victor Lempitsky

Paper:
https://arxiv.org/abs/1905.08233v1

Music:
"Fresh Fallen Snow" by Chris Haugen

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

Omdat ze maar een bronafbeelding nodig hebben kunnen de onderzoekers ook schilderijen en beroemde portretten tot leven brengen, met een nogal griezelig resultaat. Fjodor Dostojevski, die lang voordat videocamera&#x2019;s commercieel beschikbaar werden overleed, beweegt en praat in zwart-wit. Mona Lisa beweegt haar mond en ogen in stilte, de flauwe glimlach nog altijd op haar gezicht. Salvador Dalí kletst een end weg, compleet met op en neer deinend snorretje.

Deze &#x2018;fotorealistische pratende hoofden&#x2019; zijn het resultaat van zogeheten convolutionele neurale netwerken; ze hebben het algoritme getraind met een grote dataset van video&#x2019;s van pratende mensen met uiteenlopende verschijningen. In dit geval is gebruik gemaakt van de publiekelijk toegankelijke [VoxCeleb](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/) databases, die meer dan 7000 afbeeldingen van beroemdheden bevatten, afkomstig uit youtube-video&#x2019;s.

Hiermee wordt het programma getraind om belangrijke kenmerken van het gezicht te zien: ogen, de vorm van de mond en de lengte en vorm van de neusbrug.

Dit is in zekere zin nog een stuk geavanceerder dan het resultaat van [deepfakes](https://www.vice.com/en_us/article/gydydm/gal-gadot-fake-ai-porn) en andere Generative Adversarial Networks (GANs). In plaats van dat het algoritme wordt aangeleerd om een gezicht op dat van iemand anders te plakken, met een verscheidenheid aan foto&#x2019;s van verschillende uitdrukkingen van het onderwerp, kan dit algoritme de gelaatstrekken die we bijna allemaal delen gebruiken om een nieuw gezicht te maken.

De onderzoekers schrijven in hun paper dat ze toepassingen van deze technologie zien in realistische avatars voor videotelefonie, videogames en special effects &#x2013; waar doorgaans het &#x2018;uncanny valley&#x2019;-effect ons tegenhoudt in het volledig omarmen van avatars van echte mensen. Ze hopen dat ze dit met hun werk aan dit project kunnen veranderen, vanwege de lage eisen voor bronmateriaal, en &#x2018;perfecte&#x2019; realisme.

Het wordt weer makkelijker om nepvideo’s van mensen te maken

Blijf op de hoogte van onze beste verhalen!