Imágenes de artistas y obras artísticas

_Artículo publicado originalmente por [VICE Estados Unidos](https://www.vice.com/en_us)._

Los investigadores en el Centro AI de Samsung en Moscú desarrollaron una forma de crear "retratos vivos" a partir de un conjunto de datos muy pequeño: en algunos de sus modelos lo único que se requiere es una fotografía.

El documento, "[Few-Shot Adversarial Learning of Realistic Neural Talking Head Models](https://arxiv.org/abs/1905.08233)", se publicó el lunes en el servidor de preimpresión arXiv.

Los investigadores llaman a esto "aprendizaje _few-shot_" (de pocas imágenes), donde se puede entrenar a un modelo usando solo una imagen para crear un retrato convincente y animado. Con unas cuantas imágenes &#x2014;de ocho o 32 fotografías&#x2014; el realismo mejora aún más.

Statement regarding the purpose and effect of the technology
(NB: this statement reflects personal opinions of the authors and not of their organizations)

We believe that telepresence technologies in AR, VR and other media are to transform the world in the not-so-distant future. Shifting a part of human life-like communication to the virtual and augmented worlds will have several positive effects. It will lead to a reduction in long-distance travel and short-distance commute. It will democratize education, and improve the quality of life for people with disabilities. It will distribute jobs more fairly and uniformly around the World. It will better connect relatives and friends separated by distance. To achieve all these effects, we need to make human communication in AR and VR as realistic and compelling as possible, and the creation of photorealistic avatars is one (small) step towards this future. In other words, in future telepresence systems, people will need to be represented by the realistic semblances of themselves, and creating such avatars should be easy for the users. This application and scientific curiosity is what drives the research in our group, including the project presented in this video.

We realize that our technology can have a negative use for the so-called “deepfake” videos. However, it is important to realize, that Hollywood has been making fake videos (aka “special effects”) for a century, and deep networks with similar capabilities have been available for the past several years (see links in the paper). Our work (and quite a few parallel works) will lead to the democratization of the certain special effects technologies. And the democratization of the technologies has always had negative effects. Democratizing sound editing tools lead to the rise of pranksters and fake audios, democratizing video recording lead to the appearance of footage taken without consent. In each of the past cases, the net effect of democratization on the World has been positive, and mechanisms for stemming the negative effects have been developed. We believe that the case of neural avatar technology will be no different. Our belief is supported by the ongoing development of tools for fake video detection and face spoof detection alongside with the ongoing shift for privacy and data security in major IT companies. 

Authors:
Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, Victor Lempitsky

Paper:
https://arxiv.org/abs/1905.08233v1

Music:
"Fresh Fallen Snow" by Chris Haugen

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

Debido a que solo necesitan una imagen de origen, los investigadores pudieron animar pinturas y retratos famosos, con resultados espeluznantes. Fyodor Dostoievski, quien murió mucho antes de que las cámaras cinematográficas estuvieran disponibles comercialmente, se mueve y habla en blanco y negro. La Mona Lisa mueve silenciosamente su boca y sus ojos, con una leve sonrisa en su rostro. Salvador Dalí declama, moviendo los bigotes.

Estos "modelos fotorrealistas de cabezas parlantes" son creados mediante el uso de redes neuronales convolucionales: entrenaron al algoritmo con un gran conjunto de datos de videos de personas hablando, con una amplia variedad de apariencias. En este caso, utilizaron las bases de datos de [VoxCeleb](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/), disponibles públicamente, que contienen más de 7,000 imágenes de celebridades tomadas de videos de YouTube.

Esto capacita al programa para identificar lo que ellos llaman características "emblemáticas" de las caras: ojos, formas de la boca, la longitud y la forma del puente nasal.

Esto, en cierto modo, es un salto más allá de lo que pueden lograr incluso los [deepfakes](https://www.vice.com/es_latam/article/7xn4wy/motherboard-sitio-web-ia-para-crear-caras) y otros algoritmos que usan redes generativas. En lugar de enseñar al algoritmo a pegar una cara en otra utilizando un catálogo de expresiones de una persona, usan las características faciales que son comunes en la mayoría de los humanos para luego tomar control de una nueva cara.

Los investigadores dijeron en el documento que reconocen las aplicaciones para los avatares faciales realistas en videoconferencias, juegos y efectos especiales, pero el valle inquietante a menudo nos impide adoptar el uso generalizado de avatares faciales de personas reales. Ellos esperan que esta tecnología cambie eso, debido a sus bajos requisitos de fuentes originales y su realismo "perfecto".

Ahora es muy fácil crear videos falsos de las caras de las personas

Newsletter de VICE en español