Pesquisadores do Samsung AI Center em Moscou desenvolveram um jeito de criar &#x201c;retratos vivos&#x201d; de uma base de dados muito pequena &#x2013; com apenas uma fotografia em alguns dos modelos deles.

O estudo, &#x201c;[Few-Shot Adversarial Learning of Realistic Neural Talking Head Models](https://arxiv.org/abs/1905.08233)&#x201d;, foi publicado no servidor arXiv na segunda-feira.

Os pesquisadores chamam isso de aprendizado com poucas e uma imagem, onde o modelo pode ser treinado usando apenas uma imagem para criar um retrato animado convincente. Com algumas imagens a mais &#x2013; de oito ou 32 fotografias &#x2013; o realismo é ainda maior.

Statement regarding the purpose and effect of the technology
(NB: this statement reflects personal opinions of the authors and not of their organizations)

We believe that telepresence technologies in AR, VR and other media are to transform the world in the not-so-distant future. Shifting a part of human life-like communication to the virtual and augmented worlds will have several positive effects. It will lead to a reduction in long-distance travel and short-distance commute. It will democratize education, and improve the quality of life for people with disabilities. It will distribute jobs more fairly and uniformly around the World. It will better connect relatives and friends separated by distance. To achieve all these effects, we need to make human communication in AR and VR as realistic and compelling as possible, and the creation of photorealistic avatars is one (small) step towards this future. In other words, in future telepresence systems, people will need to be represented by the realistic semblances of themselves, and creating such avatars should be easy for the users. This application and scientific curiosity is what drives the research in our group, including the project presented in this video.

We realize that our technology can have a negative use for the so-called “deepfake” videos. However, it is important to realize, that Hollywood has been making fake videos (aka “special effects”) for a century, and deep networks with similar capabilities have been available for the past several years (see links in the paper). Our work (and quite a few parallel works) will lead to the democratization of the certain special effects technologies. And the democratization of the technologies has always had negative effects. Democratizing sound editing tools lead to the rise of pranksters and fake audios, democratizing video recording lead to the appearance of footage taken without consent. In each of the past cases, the net effect of democratization on the World has been positive, and mechanisms for stemming the negative effects have been developed. We believe that the case of neural avatar technology will be no different. Our belief is supported by the ongoing development of tools for fake video detection and face spoof detection alongside with the ongoing shift for privacy and data security in major IT companies. 

Authors:
Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, Victor Lempitsky

Paper:
https://arxiv.org/abs/1905.08233v1

Music:
"Fresh Fallen Snow" by Chris Haugen

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

Como só precisam de uma fonte de imagem, os pesquisadores conseguiram animar retratos e pinturas famosas, com resultados perturbadores. Fyodor Dostoevsky &#x2013; que morreu muito antes das câmeras de imagem em movimento estarem disponíveis comercialmente &#x2013; se move e fala em preto e branco. A Mona Lisa mexe silenciosamente a boca e os olhos, com um sorriso sutil no rosto. Até o famoso bigode do Salvador Dali se move.

Esses &#x201c;modelos de cabeças falantes fotorrealistas&#x201d; são criados usando rede neural convolucional: eles treinam algoritmos numa grande base de dados para fazer vídeos de cabeças falantes com uma ampla variedade de aparências. Neste caso, eles usaram a base de dados disponível ao público [VoxCeleb](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/), contendo mais de 7 mil imagens de celebridades de vídeos do YouTube.

Isso treina o programa para identificar o que eles chamam de características &#x201c;marcantes&#x201d; de rostos: olhos, formato da boca, comprimento e forma da ponte nasal.

De certa maneira, esse é um salto até do que [deepfakes](https://www.vice.com/pt_br/article/7xn4wy/e-muito-constrangedor-achar-bonito-qualquer-um-desses-rostos-criados-por-ia) e outros algoritmos usando redes antagônicas geradoras conseguem fazer. Em vez de ensinar o algoritmo a colar um rosto no outro usando um catálogo de expressões de uma pessoa, eles usam características faciais que são comuns em todos os humanos, para então animar um novo rosto.

_**Leia mais:**_ **_[Um guia para detectar e se proteger de deepfakes antes que a mentira vença](https://www.vice.com/pt_br/article/a3mkzz/um-guia-para-detectar-e-se-proteger-de-deepfakes-antes-que-a-mentira-venca)_**

Os pesquisadores escreveram no artigo que reconhecem as aplicações para avatares de rostos realistas em conferências por vídeo, videogames e efeitos especiais &#x2013; mas o [uncanny valley](https://pt.wikipedia.org/wiki/Vale_da_estranheza) geralmente nos impede de abraçar totalmente o uso generalizado de avatares de rosto para pessoas reais. Eles esperam que esse trabalho mude isso, com sua baixa exigência de fontes e realismo &#x201c;perfeito&#x201d;.

_Siga a_ _**VICE Brasil**_ _no_ [_Facebook_](https://www.facebook.com/vicebrasil)_,_ [_Twitter_](https://twitter.com/vicebrasil)_,_ [_Instagram_](https://www.instagram.com/vicebrasil/) _e_ [_YouTube_](https://www.youtube.com/channel/UCd8J_aeX6_AaoZOG9QKffrw)_._

Está ficando fácil demais criar vídeos falsos de rostos humanos