We Spoke to a Researcher Working to Stop AI from Killing Us All
Ben Rubenstein got a $100,000 grant to help him study how machine learning can go wrong.
Most of us aren't afraid of killer robots because we're adults and we know they aren't real. But earlier this month the Future of Life Institute—a group backed by famed tech entrepreneur Elon Musk—handed out 37 grants to projects focused on keeping artificial intelligence "robust and beneficial to humanity." In other words, they're devoted over millions of dollars to making sure that the machines don't rise up and kill us all.
Among other things, the funded projects aim to keep AI systems aligned with human morals and limit the independence of autonomous weapons. One of the recipients was Ben Rubinstein, a senior lecturer in computing and information systems at Melbourne University who received $100,000 to make sure computers don't turn on us and breach important security systems.
VICE caught up with him to ask how in God's name he's going to do that.
VICE: Hey Ben, so movies love the idea of AI overtaking human intelligence. Is that a real concern?
Ben Rubinstein: Personally, I don't think it's inevitable. From the outside it looks like we are moving really fast, but from the inside it doesn't look that way. When I look at AI, I see lots of things it can't do. There's this thing called Moravec's paradox, and with some exceptions, it basically says humans and computers aren't good at the same things.
I take it morality is one of the things computers aren't good at. How do you implant morals and ethics into a machine brain?
When AI becomes a level above what it is now, we need to have value alignment. The problem is, what if the utility doesn't align with a human's utility function? Isaac Asimov was a science fiction writer, and he wrote three laws of robotics: Robots shouldn't injure a human or allow a human to come into harm. Robots should obey orders from a human unless it violates [law] number one. And robots should protect themselves unless they violate laws one and two. These laws make for good reading, and make a lot of sense, but the problem is they are very vague.
How do you make them less vague?
One way some of the research projects are trying to do this is by having the AI learn human judgments. Simply get the AI to watch humans, put that into a machine, and then design an algorithm so it can observe actions we might take. Have a model of the world and ascribe it values that can explain what we are doing, if that makes sense.
Basically it's inverting the process. Instead of going from values to actions rather observing actions we are taking and try to reverse engineer us to figure out what our values are.
Tell me about your project. What will the grant allow you to do?
I'll be focusing on machine learning. So for a short-term problem, I want to find out if machine learning can be misled. When you design a machine learning system, you have something in mind that you want it to do. So it's going to extract patterns from data and it's going to accurately predict something, maybe about customer attention or predicting a disease from a medical diagnostic.
But say you were to feed a machine learning system slightly incorrect data on purpose, how much would it influence the machine learning system in the wrong way? This is particularly relevant when you are talking about cybersecurity. Imagine having a sophisticated adversary that doesn't try to hack into your system by exploiting a bug in your code but instead they mislead the machinery algorithm to make it seem like something is happening when it's not, like autonomous weapons going off randomly.
Anywhere machine learning is being used and making important decisions like someone's health—such as monitors in hospitals—it's something where my research is relevant. It's not just about hacking into the system anymore.
Is a Terminator scenario likely at any point?
Unlikely in my life, and I am in my mid 30s. But surveys have been conducted with international experts of AI, asking when might AI be able to do the general things humans can do. They say about 2040 or 2050. But AI researchers are notoriously bad at estimating how far AI is going to come. So I would take these predictions with a grain of salt.
But you're not ruling it out. Does that mean you're saying it's a theoretical possibility?
Yes, it is. AI is improving, and one day it will be there. But when you look at Terminator-style science fiction, it always looks kind of hopeless for humans and the only way out is to get a time machine. But the problem with this sci-fi is it often looks at this current society and says, What would it look like if AI became super intelligent now?
If we had Terminators or Cylons walking through the streets today we would be in trouble. But before we get there, AI is going to progress, and AI is going to be given more and more responsibility to act in our world. I think we will see small-scale accidents happen first. For example, with autonomous driving or elderly care robots there will be an accident. And any accident, even on a small scale, will significantly rein in the ability of AI to be used.
That's why it makes it hard to predict what we will do when super intelligence is around. But I am feeling pretty optimistic about the whole thing.
Follow Dan on Twitter.
Like this article? Like VICE on Facebook for more tech content.