A Neural Network Wrote the Next 'Game of Thrones' Book Because George R.R. Martin Hasn't
"The Winds of Winter" is already here ... sorta.
Immagine: HBO/Shutterstock. Composizione: Jason Koebler
*The top of this article is spoiler-free. There is a spoiler warning later in this post.*
Minutes after the epic finale of the seventh season of Game of Thrones, fans of the show were already dismayed to hear that the final, six-episode season of the series isn't set to air until spring 2019.
For readers of the A Song of Ice and Fire novel series on which the TV show is based, disappointment stemming from that estimated wait time is laughable. The fifth novel in seven-novel series, A Dance with Dragons, was published in 2011 and author George R.R. Martin has been laboring over the The Winds of Winter since, with no release date in sight. With no new source material, producers of the TV series have been forced to move the story forward themselves since late season 6.
Tired of the wait and armed with technology far beyond the grand maesters of Oldtown, full-stack software engineer Zack Thoutt is training a recurrent neural network (RNN) to predict the events of the unfinished sixth novel. Read the first chapter of the book here.
"I'm a huge fan of Game of Thrones, the books and the show," said Thoutt, who had just completed a Udacity course on artificial intelligence and deep learning and used what he learned to do the project. "I had worked with RNNs a bit in that class and thought I'd give working with the books a shot."
Not all of the predictions are completely off-base
Neural networks are a class of machine learning algorithms modeled after the human brain and recurrent neural networks are a subclass that work well with sequences of data, like text.
"With a vanilla neural network you take a set of input data, pass it through the network, and get a set of outputs," said Thoutt. "In order to train these models you need to know what the model should ideally output, which is often called your labels or target variables. The neural network compares the data it outputs with the targets and updates the network learns to better mimic the targets."
And Thoutt is working with a "long short-term memory" RNN which has better memory, the key to training a network to remember plot points from thousands of words ago. In theory, this type of memory should prevent the network from repeating events that have already happened, allowing the generated book to be a continuation of the plot rather than an alternative version of an already-published work.
In this sense, the network is attempting to write true sequels, though it obviously stumbles from time to time. For instance, it has in some cases written about characters who have already died.
"It is trying to write a new book. A perfect model would take everything that has happened in the books into account and not write about characters being alive when they died two books ago," Thoutt said. "The reality, though, is that the model isn't good enough to do that. If the model were that good authors might be in trouble. The model is striving to be a new book and to take everything into account, but it makes a lot of mistakes because the technology to train a perfect text generator that can remember complex plots over millions of words doesn't exist yet."
After adding the the 5,376 pages of the first five books in the series to the network, Thoutt has produced five predicted chapters and published them on the GitHub page for the project.
"I start each chapter by giving it a prime word, which I always used as a character name, and tell it how many words after that to generate," Thoutt said. " I wanted to do chapters for specific characters like in the books, so I always used one of the character names as the prime word … there is no editing other than supplying the network that first prime word."
George R.R. Martin isn't going to be calling for writing tips anytime soon, but Thoutt's network is able to write mostly readable sentences and is packed with some serious twists.
For instance (Fan theories and spoilers for the artificial intelligence-produced saga start here) the network predicts that Sansa Stark is actually of House Baratheon and is a part of a completely new force:
"I feared Master Sansa, Ser," Ser Jaime reminded her. "She Baratheon is one of the crossing. The second sons of your onion concubine."
"That was the very first sentence it created. I thought that was really funny," said Thoutt. In the series, the Second Sons is a sellsword company pledged to Dragon Queen Daenerys Targaryen. As for the "onion concubine?" We'll have to wait for more chapters to learn more.
The network also created a new character called Greenbeard:
"Aye, Pate." the tall man raised a sword and beckoned him back and pushed the big steel throne to where the girl came forward. Greenbeard was waiting toward the gates, big blind bearded pimple with his fallen body scraped his finger from a ring of white apple. It was half-buried mad on honey of a dried brain, of two rangers, a heavy frey.
"It's obviously not perfect. It isn't building a long-term story and the grammar isn't perfect. But the network is able to learn the basics of the English language and structure of George R.R. Martin's style on its own," said Thoutt.
Not all of the predictions are completely off-base. The network predicted that Jaime Lannister would end up killing his sister-lover Cersei, Jon Snow rides a dragon, and advisor Varys poisons Daenerys — all theories that have been talked about by fans of the show.
Jaime killed Cersei and was cold and full of words, and Jon thought he was the wolf now, and white harbor...
"I guess that validates that anything can happen in Game of Thrones," said Thoutt. " I didn't feed it anything from fan theory websites, only the books."
Thoutt said that the novels have about 32,000 unique words, which made it more difficult to train the network.
"Martin is obviously very descriptive in his writing, so those extra adjectives and the fictional locations and titles are just more complications for the network," said Thoutt.
The text of the five novels is actually a relatively small data set to train an RNN on as well. A more ideal source would be a book 100 times the size of the series, but with a children's book vocabulary level, Thoutt said.
Thoutt has considered adding additional texts to the data set, like the scripts for the TV series, but doesn't want to compromise the source material coming straight from the novels or complicate the network by including shooting script stylings.
Until 2019, computer generated tales of Westeros might be the new Game of Thrones material out there. And who knows, maybe Greenbeard shows up and takes the Iron Throne by storm at the start of season eight.