After years of development, AI is now driving cars on public roads, making life-changing assessments for people in correctional settings, and generating award-winning art. A longstanding question in the field is whether a superintelligent AI could break bad and take out humanity, and researchers from the University of Oxford and affiliated with Google DeepMind have now concluded that it’s “likely” in new research.
The paper, published last month in the peer-reviewed AI Magazine, is a fascinating one that tries to think through how artificial intelligence could pose an existential risk to humanity by looking at how reward systems might be artificially constructed.
To give you some of the background: The most successful AI models today are known as GANs, or Generative Adversarial Networks. They have a two-part structure where one part of the program is trying to generate a picture (or sentence) from input data, and a second part is grading its performance. What the new paper proposes is that at some point in the future, an advanced AI overseeing some important function could be incentivized to come up with cheating strategies to get its reward in ways that harm humanity.
“Under the conditions we have identified, our conclusion is much stronger than that of any previous publication—an existential catastrophe is not just possible, but likely,” Cohen said on Twitter in a thread about the paper.
"In a world with infinite resources, I would be extremely uncertain about what would happen. In a world with finite resources, there's unavoidable competition for these resources," Cohen told Motherboard in an interview. "And if you're in a competition with something capable of outfoxing you at every turn, then you shouldn't expect to win. And the other key part is that it would have an insatiable appetite for more energy to keep driving the probability closer and closer."
Since AI in the future could take on any number of forms and implement different designs, the paper imagines scenarios for illustrative purposes where an advanced program could intervene to get its reward without achieving its goal. For example, an AI may want to “eliminate potential threats” and “use all available energy” to secure control over its reward:
With so little as an internet connection, there exist policies for an artificial agent that would instantiate countless unnoticed and unmonitored helpers. In a crude example of intervening in the provision of reward, one such helper could purchase, steal, or construct a robot and program it to replace the operator and provide high reward to the original agent. If the agent wanted to avoid detection when experimenting with reward-provision intervention, a secret helper could, for example, arrange for a relevant keyboard to be replaced with a faulty one that flipped the effects of certain keys.
The paper envisions life on Earth turning into a zero-sum game between humanity, with its needs to grow food and keep the lights on, and the super-advanced machine, which would try and harness all available resources to secure its reward and protect against our escalating attempts to stop it. “Losing this game would be fatal,” the paper says. These possibilities, however theoretical, mean we should be progressing slowly—if at all—toward the goal of more powerful AI.
"In theory, there's no point in racing to this. Any race would be based on a misunderstanding that we know how to control it," Cohen added in the interview. "Given our current understanding, this is not a useful thing to develop unless we do some serious work now to figure out how we would control them."
The threat of super-advanced AI is an anxiety with a familiar shape in human society. The fear that an artificial mind will annihilate humanity sounds a lot like the fear that alien life forms will exterminate humanity, which sounds like the fear that foreign civilizations and their populations will clash with one another in a grand conflict.
With artificial intelligence in particular, there are a host of assumptions that have to be made for this anti-social vision to make sense—assumptions that the paper admits are almost entirely “contestable or conceivably avoidable.” That this program might resemble humanity, surpass it in every meaningful way, that they will be let loose and compete with humanity for resources in a zero-sum game, are all assumptions that may never come to pass.
Sign up for Motherboard’s daily newsletter for a regular dose of our original reporting, plus behind-the-scenes content about our biggest stories.
It’s worth considering that right now, at this very moment, algorithmic systems that we call “artificial intelligence” are wrecking people’s lives—they have outsized and detrimental effects that are restructuring society without superintelligence. In a recent essay for Logic Magazine, Khadijah Abdurahman—the director of We Be Imagining at Columbia University, Tech Research Fellow at UCLA Center for Critical Internet Inquiry, and child welfare system abolitionist—detailed the ways in which algorithms are deployed in an already racist child welfare system to justifying further surveillance and policing of Black and brown families.
"I think it's not just a question of priority. Ultimately, these things are shaping the present," Abdurahman told Motherboard in an interview. “That's what I am trying to get at with child welfare. It's not simply that it's inaccurate or it's disproportionately classifying Black people as pathological or deviant. But through this form of classification, it's moving people and producing new forms of enclosure. What types of families and kinship are possible? Who's born, who's not born? If you're not fit, what happens to you, where do you go?”
Algorithms have already transformed racist policing into “predictive policing” that justifies surveillance and brutality reserved for racial minorities as necessary. Algorithms have rebranded austerity as welfare reform, giving a digital gloss to the long-disproven arguments that social programs have bloated budgets because (non-white) recipients abuse them. Algorithms are used to justify decisions about who gets what resources, decisions which in our society have already been made with the intent to discriminate, exclude, and exploit.
Discrimination doesn’t disappear in algorithms, but instead structures and limits and informs the way life moves along. Policing, housing, healthcare, transportation, have all already been designed with racial discrimination in mind—what will happen if we allow algorithms to not only gloss over those designs, but extend their logic deeper? A long-term view that is intimately concerned with the risk of humanity’s extinction risks losing sight of the present where humans are suffering because of algorithms deployed in a society built on exploitation and coercion of all, but especially of racial minorities.
“I'm not personally worried about being extinguished by a superintelligent AI—that seems like a fear of God. What concerns me is that it's very easy to be like 'OK, AI ethics is bullshit.' Frankly it is. But, what are ethics? How do we actually define it? What would sincere ethics be like? There's bodies of work on this, but we are still at the shallow end, " Abdurahman added. “I think we really need to deepen our engagement with these questions. I disagree with the way that apps have renegotiated the social contract or the vision of crypto bros, but what type of social contract do we want?"
Clearly, there is much work to be done to mitigate or eliminate the harms that regular algorithms (versus superintelligent ones) are wreaking on humanity right now. Focusing on existential risk might shift focus away from that picture, but it also asks us to think carefully about how these systems are designed and the negative effects they have.
"One thing we can learn from this sort of argument is that maybe we should be more suspicious of artificial agents we deploy today, rather than just blindly expecting that they'll do what they hoped," Cohen said. "I think you can get there without the work in this paper."
Update: After publication, Google said in an email that this work was not done as part of co-author Marcus Hutter’s work at DeepMind—rather, under his position at Australian National University—and that the DeepMind affiliation listed in the journal was an “error.” Google sent the following statement:
“DeepMind was not involved in this work and the paper’s authors have requested corrections to reflect this. There are a wide range of views and academic interests at DeepMind, and many on our team also hold university professorships and pursue academic research separate to their work at DeepMind, through their university affiliations.
While DeepMind was not involved in this work, we think deeply about the safety, ethics and wider societal impacts of AI and research and develop AI models that are safe, effective and aligned with human values. Alongside pursuing opportunities where AI can unlock widespread societal benefit, we also invest equal efforts in guarding against harmful uses.“”