Content warning: This article includes firsthand accounts of sexual abuse.
A collection of thousands of photographs of naked women that is being used to create machine learning-generated porn includes images from porn production companies that have been accused of lying to and coercing women to have sex on camera.
The dataset, which is circulating in deepfake porn creation communities online, includes images scraped from Czech Casting, a porn production company in the Czech Republic that police have accused of human trafficking and rape, as well as still images from videos produced by Girls Do Porn, which was ordered to pay almost $13 million to 22 women who appeared in its videos, and whose founder is currently a fugitive on the FBI's most wanted list.
Much like thispersondoesnotexist.com, which uses a machine learning algorithm and thousands of pictures of human faces to produce photorealistic images of people who don't exist, the dataset is being used to generate photorealistic images of nude women who aren’t real and don't look exactly like any one person. One person using the dataset is creating what he describes as "a Harem of millions of actresses" that can be inserted into deepfake porn, while another is using the dataset to create what he describes as "porn generated entirely by AI."
Motherboard has downloaded and viewed the dataset containing images from Czech Casting and Girls Do Porn, as well as several others being used to create machine learning-generated porn.
The people who anonymously use these datasets say that since the final algorithmically-generated images they create technically aren't of real people, they don't harm anyone. In fact, they argue that their creations are a step towards a future where porn will not require human porn performers at all. But legal experts, technologists, and women who are included in the datasets described these creations as uniquely dehumanizing.
Motherboard has written extensively about how deepfakes and internet platforms' inability to sufficently curtail the spread of nonconsensual pornography upends the lives of and continually traumatizes women. This new form of machine learning-generated porn and the datasets it relies on introduces a new form of abuse, where the worst moments of some women's lives, captured on camera, are preserved, decontextualized, and spread online in service of creating porn whose makers claim to feature people who don't actually exist.
Honza Červenka, a lawyer at McAllister Olivarius law firm who specializes in revenge porn and technology, is originally from the Czech Republic and has been following the case of Czech Casting, which is owned by Netlook, the country’s largest porn company. He told Motherboard that the idea that images are less harmful because they're run through an algorithm and "anonymized" is a red herring.
"It's mad science really, and completely and utterly re-victimizing to the victims of the Czech Casting perpetrators," he said.
"It feels unfair, it feels like my freedom is being taken away," Jane, a woman who said she was coerced into shooting a scene for Czech Casting, told Motherboard.
The casting couch trap
Jane, who asked to remain pseudonymous to speak about a traumatizing incident, remembers her hands shaking as she read over a contract for Czech Casting. She was there to support her friend, who needed money for rent. They'd answered an advertisement for a modeling gig, and decided to go together. They'd both just turned 18. They didn't know what kind of modeling it was; the ad was vague about details. Someone picked them up at a metro stop and took them to a house on the outskirts of Prague.
(In an interview with Czech bodybuilder Antonin Hodan posted to YouTube, a male performer in Czech Casting videos named Alekos Begaltsis admitted that the women who show up for shoots sometimes don't know what they're in for because of deceptive advertising.
"The girls get here through agencies as well with the help of private agents or through friends, anyone can recommend," Begaltsis said. "We can't control every piece of information in the advertising. It can happen that a girl gets here thinking she'll do an underwear photoshoot. Which sucks because we are powerless in these situations. We are trying to push them to write the truth [in the ads]. Unfortunately it's not always the case. But once she gets here, we inform her about everything.")
Once at the studio, a woman at the reception desk took Jane's ID.
"We sat in a waiting room and got up to leave two or three times, but someone would always come up and tell us to stay, to not be afraid," she said. "We were scared to leave so we stayed."
A woman called them one by one into a room with a white sofa where the filming would take place, and handed them a contract saying the videos wouldn't be accessible to anyone in the Czech Republic. This part of the arrangement is similar to the lie Girls Do Porn told women about how their videos were only going to be distributed to "collectors" in New Zealand. In reality, Girls Do Porn videos were published and sold in the U.S. and promoted on Pornhub.
Czech Casting does indeed block users trying to access it from the Czech Republic, Motherboard confirmed by trying to access the site using a virtual private network. But people within the country can also easily circumvent the block using a VPN, which is free and easy to set up. Additionally, as women who accused Czech Casting of wrongdoing have said, their families and friends quickly discovered their videos, which were reposted to popular free tube sites, where sometimes their real names were doxed.
"Weeks later I started getting messages…These were mostly from men saying how beautiful I was and if they could have sex with me," Jane said. "I got so many of these messages and keep getting them. I even changed my Facebook name because of this."
After she signed the contract, a man came in and asked her if she was a virgin. She said that she felt like she had no way out, and that she couldn't leave without her ID.
"After I said yes, he took the camera and told me to get naked," Jane said. "I was told they were going to film something soft. . .I was scared to speak out."
Jane said they put the money into her hands as she was leaving. She wasn't given a copy of the contract she signed, or any proof that she'd been there at all.
"My friend found the room we were in on a porn site," Jane said. "I realised this was a massive fuck-up. I kept thinking we should have left even if it means not having our IDs on us."
In another Czech Casting video, a woman, who Motherboard was able to confirm is included in the dataset, starts crying while having sex and asks the man to stop. The man stops, and the camera zooms in to show that she is bleeding. He hands her a towel and tells her to clean up the blood.
Jane's story about Czech Casting isn't unique. Multiple women have accused Czech Casting of coercing them into having sex on camera. Czech police have charged nine people involved with Netlook, the company behind Czech Casting, of human trafficking and rape. Daisy Lee, a woman who went on to a career in porn after her Czech Casting scene and who is now friendly with Begaltsis, said the site has ruined lives.
"I was 18 and didn't know what I was getting myself into. Most girls do not. The majority of them stay, but some leave. It ruins many lives," Lee told Motherboard.
In a statement published in July by the adult entertainment news site Xbiz, Netlook denied the accusations and said it is cooperating with the police. Netlook did not respond to Motherboard's request for comment.
In September, four years after Jane shot her scene for Czech Casting, a PhD student opened a new forum to show off his latest personal AI project: algorithmically-generated porn.
The person making these videos goes by the username "GeneratedPorn," and named the r/GeneratedPorn subreddit to post about the technology (we'll refer to this user as "GP" in this story). He said he started the project because he wanted to improve his machine learning skills. Like some of the earliest deepfakes that were posted online in 2017, what he shared were glitchy, spasming facsimiles of the images they're trained on: thousands of porn videos and images. Unlike much deepfake porn, the images GP is producing wouldn't fool anyone into thinking they are real porn. The final result barely looks human, let alone like a specific person.
Do you have experience with “casting couch” producers, or knowledge of how non-consensual porn spreads online? We’d love to hear from you. Contact Samantha Cole securely on the messaging app Signal at +6469261726, direct message on Twitter, or by email: firstname.lastname@example.org
But much like early deepfakes, they're rapidly improving in realism. GP has posted several experiments in the past few weeks featuring increasingly accurate naked human bodies, and even some slightly animated images, showing that convincing "porn generated entirely by AI" is not impossible.
"This all started as a quest for me to learn how all of this cool tech worked but then I ended up pivoting into the porn generation stuff as I thought it was a cool concept, especially after watching the movie Her," GP said in an email to Motherboard.
GP explained his process to Motherboard over email, as well as in detail on Reddit, posted in the popular r/MachineLearning community. He used a Stylegan2 model that's available on Github as open-source code, but loaded it with datasets of porn. It's similar to how any other face-swapping deepfake is made, but instead of using a dataset consisting of many expressions of one person's face, he pulled from multiple datasets found online.
To create the videos, GP trained the algorithms using datasets from around the web, including one that primarily consists of images ripped from Czech Casting. The datasets, which are hosted and are free to download from popular file sharing sites, are compiled by users experimenting in deepfakes and other forms of algorithmically generated images. GP found the Czech Casting dataset on one of these file sharing websites, but said that if he didn't he would have written a web scraper to collect the images from Czech Casting.
This is because of the scope and uniformity of the porn that Czech Casting has created.
Creating algorithmically generated videos of a full, naked body requires many images and videos of real, nude people, and it's hard to imagine a more suitable resource for the task than Czech Casting.
Czech Casting, much like Girls Do Porn, specialized in casting couch-style porn, and has posted thousands of videos of women over the years. Its production style was almost algorithmic to begin with: Each video of a woman also comes with a uniform set of photographs. Each set includes a photograph of the woman holding a yellow sign with a number indicating her episode number, like a mugshot board. Each set also includes photographs of the women posing in a series of dressed and undressed shots on a white background: right side, left side, front, back, as well as extreme close ups of the face, individual nipples, and genitalia. In recent years, Czech Casting also started including 360-degrees photographs of the women, where they pose for interactive VR-style content.
"The main reason people opt for a data source like this, is that the generative adversarial models (GAN) people use, are trying to learn a general structure of an image for the class of objects you're trying to generate," GP said. "If your images are structurally similar, the model can learn more about the finer/granular details of the item class, like dimples or freckles on a face. Which leads to a higher quality result."
GP sent Motherboard a sample of the dataset he's using, which also included images from Girls Do Porn videos. Other datasets that GP is using, which Motherboard has viewed, include images that appear to be scraped from across the internet, including other porn sites, social media, and subreddits where users post selfies, like r/roastme, a subreddit where people post images of themselves for other people to judge.
Gigabytes of questionably-sourced images
In a post to the r/MachineLearning subreddit explaining how his algorithmically generated porn works, GP pauses halfway through the explanation to address "a potential ethical issue."
"I wasn't sure what to do with it, other than it being this cool thing I'd created… I'd contemplated making an OnlyFans and offering personalised AI generated nudes that talk to people," he wrote. "But someone I knew frowned upon this idea and said it was exploitative of Males who might need companionship. So I decided not to go down that route in order to avoid the ethical can of worms."
He also noted in that post that training dataset ethics is something he's concerned about. "Are the images we are training on ethical or have the people in the images been exploited in some way[?]" he wrote. "I again can't verify the back story behind hundreds of thousands of images, but I can assume some of the images in the dataset might have an exploitative power dynamic behind them," noting that some of the images are from Girls Do Porn. "I'm not sure if it's even possible to blacklist exploitative data if it's been scraped from the web. I need to consider this a bit more."
These questions didn’t stop GP from building the project in public, on social media platforms, which means he’s perpetrating harm regardless of whatever ethical quandaries he says he may have. Much of the most harmful nonconsensual content is spread on the internet through surface-level platforms like Twitter, Facebook, Reddit, OnlyFans, and tube sites like XVideos and Pornhub.
"So many mainstream porn websites host child pornography and nonconsensual pornography, and does depict rape, and profit from those through ad sales," Červenka said.
When Motherboard contacted Reddit for comment, a spokesperson said Reddit's site-wide policies "prohibit involuntary pornography, which applies to all content, including deepfakes." Reddit banned deepfakes in 2017. Both r/GeneratedPorn and r/AIGeneratedPorn were shut down after Motherboard's request for comment.
Generated Porn's user profile on Pornhub was also taken down after Motherboard contacted Pornhub. A spokesperson for Pornhub declined to comment.
Porn tube site xHamster took down GP's user profile pending further review: "These new types of content are indeed grey areas and we will need to review with our own machine learning team and TOS team to determine how to evaluate and where necessary prevent," a spokesperson for xHamster said.
XVideos, another free tube site, directed Motherboard to a content removal form.
"Now somebody walks up and uses those images to create a baseline for computers to use, potentially for decades to come, to use for computer generated images?”
In an email to Motherboard, GP expressed another ethical concern: that the algorithm might produce something that is recognizable as a real human—a result that would negate the whole point of his project: anonymity.
"It's quite possible for the algorithm to reproduce fake people who resemble real people, but it wouldn't be a 1-to-1 replication of the data it has trained on," he said. "This presents an ethical problem I'm trying to navigate around, which is identifying the rare situations where it does replicate a person from the ~7,500 images it's learning from. It's something that plagues generative networks… It's possible and I'm not quite sure how to 100% avoid the possibility of this happening. But I really do want to avoid this. I'm not interested in deep-faking anyone, even by accident, it's a bit scummy imho!"
GP is far from alone in this type of project. The creator of the first deepfakes told Motherboard almost the same thing in 2017: that he wasn't a professional researcher, just a programmer with an interest in machine learning who “just found a clever way to do face-swap,” he said.
These Nudes Do Not Exist and a subsequent project from the same creator called "Harem" most likely draws its data from Czech Casting—the images come out looking unmistakably similar, but the creator of that project hasn't responded to requests for comment on where the images in their dataset come from. Another abandoned project at r/AIGeneratedPorn did the same.
The real ethical issue plaguing this project is not the risk of parting lonely men from their money. It would take one search online of Czech Casting, and some basic awareness of the concept of pirated content being harmful to creators, to recognize the datasets these non-existent women are built from are comprised of gigabytes of questionably-sourced porn, some of it potentially depicting sexual assault.
On Monday, the night before this story was published and after his Patreon account was suspended, GP told Motherboard that he “decided to shut down the project.”
"It certainly should be illegal"
Jane told Motherboard that she was hoping her video would get lost among so many others online, and no one would find it. "But there is always someone who manages to fish it out from the depths of the internet," she said.
Červenka, the lawyer at McAllister Olivarius law firm who specializes in revenge porn and technology, told Motherboard that because some of the Czech Casting videos were allegedly edited to look consensual from the start, they have always been deceptive and harmful—and churning them through the meat grinder of machine learning algorithms doesn't make them less so.
"Now somebody walks up and uses those images to create a baseline for computers to use, potentially for decades to come, to use for computer generated images? It's awful, on a personal level, and it certainly should be illegal," Červenka said.
Even for professional porn performers, stolen content is an issue that plagues the industry. Adult performer Leah Gotti, whose images are part of the datasets GP is using without her consent, told me that the problem of stolen content isn't just disrespectful—it's dangerous. She's currently working to stop a stalker-fan from creating fake Instagram accounts of her and targeting her family by stealing her content and reposting it.
"It just goes back to, no one truly respects sex workers," Gotti told me. "All those things are pirated, and that's supposed to be against all the rules, but because we're having sex on camera they're like, well, she asked for it."
Earlier this year, a rumored OnlyFans leak of a database of stolen porn threatened to put sex workers on that platform in danger of being harassed or doxed.
Daisy Lee, the performer who started with Czech Casting when she was 18 but continued working in the adult industry after, told Motherboard that she blames herself for thinking that the videos wouldn't go viral worldwide.
"They don't put it on Czech servers but people download it and re-upload it everywhere," Lee said. "Every girl that goes in thinks it won't be visible to their friends and family… 14 days later [my] video was everywhere. It destroyed my reputation and spread around my home town within hours. But nobody forced me to do anything, no drugs, nothing like that."
Many of the women who were targeted by Girls Do Porn also blame themselves for believing the company’s lies claiming that the videos would stay in a certain region—in that case, in private New Zealand collections, on DVD. But the entire system of porn online, and all content online for that matter, is set up to spread videos and photos the harder one tries to remove it. Algorithms are driven by what people feed them. One Czech Casting model lost her teaching job after students found her episode online, and when she spoke out about feeling victimized by the company, people sought her video out more.
"The researcher in me feels like 'if it's been published online it's open source and fair game' however the budding capitalist in me feels like that violates IP in some sense," GP said. "I'm a bit conflicted. I've personally accepted that any data I ever create as an individual will be used by others for profit or research."
GP also said that he thinks the type of abuse Czech Casting has been accused of is "horrible," but that it's difficult to screen for this kind of abuse when creating or using datasets.
"There is no such thing as ethical use of an AI that uses database images without consent”
"Now that the abuse is present I can opt to not use that data and source data from elsewhere," GP said. "Others in the area may not care and may decide to use it anyway. It's quite difficult to screen for this data completely. Doing a google image search for 'female standing nude' gives you a bunch of Czech Casting images. Throwing on the flag '-"czech"' catches a lot of them, but some still get through the cracks."
While GP said that he could choose not to use images produced by Girls Do Porn and Czech Casting, he didn't say that he would, nor is it clear if his project and others similar to it could function without those images. GP also suggested that his project could also somehow help these women.
"I feel bad for the victims of this abuse and I can't say anything that may make them feel better," he said. "My only hope is that technology such as the tech I'm working on, now and in the future, leads to a reduction in harm to others. By making it an economical and technologically inferior choice to commit abuse."
Červenka said that even after three years of deepfakes panic and decades more of nonconsensual porn online, the laws to stop them haven't caught up. Victims could make a legal claim that they've been portrayed in a false light or defamed, especially when content is edited deceptively to make it look consensual. But that's often not enough.
"These laws have been around for a long time, and we are just trying to use them in the current context, because we don't have anything else," Červenka said "The legislature is unable to truly grapple with what people do online, and how to regulate harmful effects of what people do online."
It also becomes harder to go after anyone hosting the content if they're hosting it anonymously, all over the world, where every legal system is different. Even in the U.S., where some states have enacted deepfakes-specific laws, it differs from state to state.
When the content is buried inside a dataset, the problem is that much more difficult.
Is ethical AI porn possible?
The abuses the women in Czech Casting and Girls Do Porn endured happened in the real world, but the videos spread online made it worse. Some Girls Do Porn victims were forced to change their names, move states, drop out of school, and lost their careers or relationships with family and friends. Czech Casting victims have similar stories.
Revenge porn victims—as well as professional and amateur adult performers—spend hours sending takedown requests to websites that host their images. Often, those requests are ignored. And when it comes to datasets used to create more porn, it's hard to know where your images live on, unless you can locate where it's hosted and download a huge set of files, then sort through them to find yourself. Their worst moments are enshrined forever among gigabytes of others.
There have been efforts in recent years to create machine learning datasets that are fully consensual. After the privacy failures of MS-Celeb-1M, a dataset of 10 million photos from 100,000 individuals collected from the internet, ranging from journalists to musicians and activists, there's more awareness than ever toward ethical uses of people's faces. In 2019, for its "Deepfakes Detection Challenge," Facebook launched a dataset consisting of 100,000 videos of paid actors, for researchers to use. One of the sponsors of that challenge was data science community site Kaggle. One of the datasets Generated Porn used is hosted on Kaggle, and appears to be largely stolen, scraped porn content.
If machine learning engineers interested in creating AI porn wanted to start a fully-ethical project, they would do something similar to what Facebook did with its challenge dataset.
"They would get consent from people who want to be nude models, and say this is what we're going to build it for, and everything's on the up and up," Rumman Chowdhury, data scientist and founder of ethical AI firm Parity, told Motherboard. "And maybe even [models] would get royalties, [engineers] would go build their AI, sell it as a porn, and they would actually do pretty well." But doing things the right way costs money, and when you're tinkering with porn as a side project, it's usually money you don't have. r/AIGeneratedPorn's project died because renting server time and running the training was too expensive, according to a post in that subreddit before it went down.
"There is no such thing as ethical use of an AI that uses database images without consent," Chowdhury said.
"How can a tech that at its core has rape videos be anything but a perpetuation of rape culture?" Červenka said. "I don’t think I would sleep well at night if I were [GP], because he's relying on images of abuse to create a Frankenstein's monster."