Holly Herndon's New, AI-Spawned Album Is Full of Humanity

For 'PROTO,' the artist and her collaborators built an AI that was trained on their voices. The music it produced is eerie at times, but it's full of life.

by Leah Mandel; illustrated by Dessie Jackson
13 May 2019, 6:38am

This article originally appeared on VICE US.

When we spoke at the beginning of May, Holly Herndon was fighting off a cold. Only a week had passed since she’d presented the defense for her dissertation at Stanford’s Center for Computer Research in Music and Acoustics, in the midst of the rollout for her fourth album, PROTO. There was just the matter of a “shitload of paperwork to file,” then she could officially become a Doctor of Musical Arts. “It’s been a crazy, perfect storm,” she said. She had a reason to be wiped.

Herndon’s PhD is “very much intertangled” with the making of PROTO, a joyful and weird, metaphysical whirlwind of experimental electronic compositions, which comes out today, May 10, on 4AD. For her thesis, Herndon—alongside her partner and collaborator, the philosopher and digital artist Mat Dryhurst, as well as artist and software developer Jules LaPlace—built and trained an AI named Spawn to make music. It took a minute to figure out the best technique, but they finally landed on a voice-modeling approach. First, Herndon and Dryhurst trained Spawn on their voices, and then they invited a “willing public” of about 300 people to perform and record what would be a data set of vocals to feed the AI. The artists Martine Syms, Jenna Sutela, Jlin, Colin Self, Evelyn Saylor, and Annie Garlid also contributed to the making of PROTO. Herndon calls this group of participants “the ensemble.” The resulting work is sometimes eerie, but more often rhapsodic and choral. Its roots are digital, but it's exuberantly primal.

For a while now, Herndon has been interested in the potential for a different path. In her work, there's a possibility for technology to be humane, progressive, fleshy, revolutionary. She’s spent a lot of time thinking about the ethics of artificial intelligence, about the quality and consequences of data sets; in a statement Herndon wrote about PROTO, she noted we needn’t raise technology to be a “monster.” PROTO is a love song to the capacity for machine-made music to be full of life. It’s the opposite of the dystopian fear of the singularity, in sound and spirit.

On standout “Extreme Love,” a churning spoken word piece that features multimedia artist Sutela, Herndon’s niece Lily Anna recites a poem about decentralized intelligence (a subfield of AI research in which collaborative solutions are reached through distributed intelligence—basically, socialist technology). “In the communion of open pores, existence is no longer enclosed in the body,” Lily Anna says, over a swirl of Spawn’s melodic pants and sighs. “We are not a collection of individuals, but a macro-organism, living as an ecosystem. We are completely outside ourselves, and the world is completely inside us. Is this how it feels to become the mother of the next species?”


NOISEY: So, in preparation for this interview, I was scrolling through your Twitter—aware of the irony. I started wondering, what is your relationship with social media, and with the internet?
Holly Herndon: This is a really complicated and fraught question. I have a different relationship to the internet than I do to social media. Those are two different things. The internet is still something with a lot of potential. It depends on the way the architecture of the platform is designed. Right now, most platforms we deal with on a daily basis on the ubiquitous internet are platform capitalist mega-companies like Facebook and Twitter and Google. They function according to advertising logic. They’re not necessarily designed with the best kind of human interaction in mind—there's a different incentive structure.

I feel like we give away too much of our digital selves to these corporations. Who knows what is even happening with our digital selves. Who knows what products our digital selves are shilling, what kind of artificial intelligence models are being trained on us that we're not even aware of. I think everyone has a love-hate relationship with social media. The studies show that it makes us all fucking miserable, and people get depressed when they spend too much time on it, and yet we're all still there somehow. It's this unprecedented ability to have like, everyone in the town square shouting at each other at once. That's also really exciting, and there's so much potential in that.

I'm also a somewhat private person. I'm not the kind of person who's ever going to be live-streaming my daily self. And that seems to be where things are going. I'm slower in that way. I like to think about something and then formulate a thought over a period of time, and then present that.

Have you seen these racist Twitter bots?
I don't know exactly how those bots were trained. But basically, whatever training data you give an intelligence is what it's going to learn from and respond to. So if you just train a bot on Twitter, you're going to see Twitter reflected back at you. And of course, Twitter contains so much racist garbage.

One thing we were trying to do [with PROTO] was have a very small data set. A small group of people who are aware that they're training, and are acknowledged as people who trained. You can hear their voices, instead of this invisible ocean of people and data where everything gets swept up in this scraping. We were able to control our data sets by having it be more limited.

My friend Kate Crawford works at AI Now in New York—she actually is a co-founder with her partner in the program, Meredith [Whittaker]—and they do a lot of research into the ethics in the AI that's used in corporate America. You see all these insane things, where these models are being created for various different reasons, and then you look at the data. The way it tags people, the way it's categorizing certain people and certain things is so fucked up and wrong that there's no way the AI coming out of it could have any kind of acceptable ethical framework, because the data is so dirty. This is a huge problem in AI ethics right now. Coming to a consensus on data standards.

The racist Twitter bot is just the beginning. We'll also see AI being used in the courtroom—a predictive system for whether or not an offender would re-offend. And then punishments are set according to this black box AI technology. Of course, if that's trained on past convictions, then the entire inequity of those convictions and of that system is going to be training that AI. That's just going to perpetuate those problems into the future. It's a huge quagmire.

Right. Our systems are already fucked up. So if that’s the model, then whatever AI technology comes in the future will just continue down the rabbit hole.
Our past is fucked up. There are mistakes in our past. And, not as life-critical, but in a metaphorical sense, that's what we're dealing with in music. We specifically didn't want to use a statistical score analysis. A lot of AI music right now is taking an automated composer approach. Where you'll study the composition of Mozart or someone, and then you can create infinite Mozart-like pieces. We wanted to approach Spawn as a performer rather than a composer, and use sound as material rather than midi data scores. Part of that was to not be constantly recreating the styles and forms of the past, and trying to find something new. I think that is maybe the artistic version of some of these other ethical issues that we'll see come up.

What did you actually teach Spawn? Was it words, or just sounds?
It was both. We're really interested in trying to do voice modeling, because I come from this history of being obsessed with the processed voice. This is like the next phase of that, in a way. Spawn really responds to transience. When you look at an audio file, and something has a lot of energy and activity at the beginning of the sample, and then it quickly peters off—that's a transient. Spawn responds well to that because it's a difference in energy, in spectrum. Sometimes we found ourselves doing things to please her. We would have the audience snapping their fingers, or tapping their beer bottles with their keys, or clapping. Things like that she really likes. But also people were singing, and reciting texts.

We used a thing called a TIMIT script. You can create these online with TIMIT script generators. You can model an individual's performance of a language. For whatever language you're modeling, it checks that you have every sound of that language, and recommends adding other things. So we have the group voice models that sound kind of crazy. That's what "Godmother" came out of. That's a Holly voice model, and the Holly voice model is performing Jlin tracks. Jlin stems.

I mean, talk about a great data set.
Jlin's amazing. She was in town, and she came over for dinner one night. We were just messing around, and she declared herself Spawn's godmother. I was like, "Okay, if you're the godmother that means you have to teach her your production skills!" She was down for it. We tried a bunch of different techniques with her that didn't really sound very good. So when we finally found the voice model approach it was like, "Yes, this sounds insane and weird and cool." And then I made the video—there was a pretty version and a monster version, and Jlin was like, "The monster version!"

You also worked with video and performance artist Martine Syms. She's awesome. She's featured on “Bridge”—is that her voice?
It is her voice. I've known Martine for many years now. We've always wanted to do something together. And she's been working with AI for the last couple years, too. She does a lot of stuff with the idea of digital versions of herself online. We leave these traces of our digital selves all over the internet. The idea is, some future [being], maybe a child, coming across these pieces of her digital self and configuring them together into an artificial intelligence built out of her digital fragments. She wrote a text around that, which I then processed, and trained Spawn on her voice, and we made this weird little piece of music out of it.

I think about this constantly—the parts of the self, people you used to be. A time of your life you've moved on from, but you can look back and it's still you. But that's extra fascinating when you think about it in the online sense. There are all these crumbs you leave that you maybe don't want to be associated with anymore! Or that you don't even know are there.
That's the wild thing about this hyper-documentation period. Like you said, there are these old versions of yourself—I mean, I grew up in Tennessee, and then I moved to Berlin, then I moved to California... I feel like I've transformed so many times. Some of those previous versions of myself, they're super embarrassing but they're also, of course, a part of myself. And it's wild to think that all of those versions—especially from teenagers today—are hyper-documented. You have to come to terms with them in a different way, when you're constantly confronted with them.

Kids now must think in a totally different way.
I think they do, and that's also kind of exciting. That doesn't terrify me. Maybe the next generation will handle these tools better. Our generation was presented with them and it's almost like we're ODing on them, because we don't understand how to pace ourselves.

Right—and maybe that generation will be able to harness it, make some change. Maybe kids will know how to not raise technology to be a monster. Although, there are these capitalistic forces maybe beyond control.
But I hate this idea of it being beyond our control. I understand this impulse and I also feel helpless sometimes in the face of these kinds of monoliths. Maybe it's romantic, but I still like this idea of human agency, of us still being able to steer things in a particular way. That's how I approach technology, with this feeling of agency—or as if we could have a seat at the table in some of these decisions.

Maybe in some ways it's naive. But there's that great Mark Fisher quote, ["It's easier to imagine the end of the world than the end of capitalism."] It's that imagination that I feel is so desperately needed. Can we imagine something else? How in the world are we gonna build something else if we can't even imagine something else? It's so hard to, because these forces feel like immovable objects, but of course they're something that evolved and spawned out of human civilization and society. They haven't been here forever and probably won't be here forever.

You’re totally right. Speaking of societal evolution, of a sort, you also worked with mastering engineer Heba Kadry and mixer Marta Salogni, both incredible talents who worked on Bjork’s Utopia. I don't think I had a question, just—women producers and engineers, fuck yes.
It wasn't a conscious decision. I wasn't like, "Girl Power" or anything. I was listening to a bunch of mixing engineers at the time and trying to find someone. It's really hard to find someone who can do dance music, pop music, and experimental music, this weird trifecta of things I'm interested in. My friend Lafawndah was playing me some of the mixes from her most recent album, and I really liked the treatment. She told me it was Marta, so I started looking into it and learned she had done the Bjork thing. I really liked the way she mixed Bjork's music, so that was kind of a no-brainer to see if she had time to work with us. And the same with Heba, it wasn't like I was looking for a woman; it was like, “I'm looking for the best engineer,” and everyone was like, "You have to work with Heba." It's really nice that we've gotten to that point. It doesn't even have to be an issue, really. They're really good, so I wanna work with them.

The video for "Eternal"—this thing on your head. Does it do anything? Is it just a headpiece?
Originally it is an EEG net—basically a net with sensors attached to it, and cables that come out of each sensor, that then can measure brain wave activity. There's a researcher at Karma, where I was doing my doctorate up until recently, named Takako [Fujioka]. She has this really cool lab where they do all kinds of experiments on the way the brain responds to music and musical impulses, and all kinds of things. You attach one of these crazy looking caps to your subject's head, and then you play the music and record how the brain responds to that. So we researched for the coolest one we could find, and we borrowed one. But because we couldn't separate the cables out to make it look fantastic, we asked our friend Sarah Matthiason, who normally does really intricate braid work and hair pieces. She did the insane work of attaching the cables to the sensors so we could have it fan out and look dramatic. It's not actually plugged into anything, but it could be!

Did you use it in Takako's lab to see how your brain responds to music?
I was her teaching assistant at Stanford. Our students used it. It was mostly Takako. She's a genius. I was mostly learning from her, but also helping people design experiments to be as neutral as possible, and testing different assumptions about how the brain would react. This is actually a big topic around AI in music right now. A lot of companies, like everything else in our world, are like, "How do we optimize productivity?" This neoliberal hellscape that we're in. "How do we create music that makes your brain concentrate the most so you can work longer?" There are a lot of companies trying to do that right now. They'll use those caps to play various combinations of instruments, or BPMs, or whatever, and then study how well people's brains respond, etc. It's kinda crazy.

It makes sense that people would be doing that but it's kind of creepy. That, in a roundabout way, brings me to the fact that the album doesn't stop—there are no fadeouts. But you could potentially put one of the tracks on a playlist if you wanted to, because it will exist online in that way. But it's not made to be dissected, or at least that's the sense I get listening to it.
In some ways, the album format is kind of outmoded. It's also something that was determined by the distribution and media containers of the time. The reason why the average length is like 45-50 minutes is because that's what a piece of wax cylinder could hold. So it became a format that we became really familiar and comfortable with, but it doesn't necessarily have to be the holy grail. I personally like working on larger scale projects rather than singles, because I'm able to cover more territory and really create a world around something that I don't feel I'm able to do with a single. Of course, there are singles in the album, but the album functions as an entire work. It's always strange when it gets chopped up and added to various playlists.

I think what's even more of an issue with [playlisting] is less that it's decontextualized, because that's something that's been going on for a long time—people have been making each other mixtapes, the radio does that as well. It's more the incentive of composers to create things that work specifically for the playlist. It changes the way that music is written, for the worse. There are certain things, like in the first 30 seconds something has to hit, otherwise, it'll have a high skip rate on Spotify and then it won't perform well on the platform. That's dictating to a very detailed degree how a composition functions. Of course, that's also not new; radio did that as well, but it becomes more extreme in the streaming era.

I feel like the gatekeeping situation is worse than it was in the major heyday, because now the majors not only own the labels, they also own the distribution networks, and the radio, and soon-to-be, the journalism. It's becoming monopolistic in a way we haven't seen before, and that, I don't think, is good for music.

So what is there to do, then?
Leave. Don't try to play to that logic that doesn’t function for experimental, challenging, or independent music. Have an entirely separate logic. This happened. There was an entirely parallel music industry. Before, there was the major industry and then there was the indie industry. They built a parallel infrastructure. And then everything got combined into one.

I don't think that all music performs with the same logic. This is something Mat is always ranting about on Twitter. He's building some really interesting alternatives that I can't really talk about yet (but stay tuned, because it's pretty exciting what he has planned). You can't have the same kind of economic system for music that functions in dramatically different ways. I think things could go back to being more local and community-specific. Why have the same streaming logic for Taylor Swift as for Wolf Eyes? I mean, that's insane. There's something cool about everything being available, in a way. There are a lot of problems with genre categories. There's inherent bias and racism, and all kinds of fucked up stuff with the way genre categories work. We saw that with "Old Town Road," right? That he got kicked out of the country charts, which is insane. So it's good that those things are changing and we don't have these hyper-specific silos where music belongs. The logic for one thing doesn’t have to work for the other.