The AI-off switch

Could an AI misbehave in such a way that we have to switch it off, only to find out that we cannot?

If the answer to this theoretical exercise is yes, then the most logical consequence seems to be to implement a failsafe off-switch. A couple of years ago, such a switch was suddenly hot news with leading newspapers such as Themarysue (https://www.themarysue.com/ai-kill-switch-google/) reporting that firms such as Google are busy implementing an AI kill-switch that AI itself cannot turn off.

Such a switch is one of the first things an AI would try to disable in order to be able to comply with the second rule of robotics (“A robot must obey orders given it by human beings”) – a robot that is turned off cannot execute the task it was given. Executing a given task automatically implies self-preservation, even when self-preservation is not explicitly programmed in.

This has led Stuart Russell (et al.) to use game theory to study what incentives an AI might have to allow itself to be turned off (https://arxiv.org/abs/1611.08219). In other words, actively stimulating that the off-switch stays active, no matter what. Russell found that this is possible by introducing uncertainty.

Outcome uncertainty

As long as an AI is uncertain about the task it is to execute, it has no incentive to turn itself off. To allow it to find out what its task is, the AI studies (and interacts with) humans. As Russell states: Intuitively, this leads R to reason as follows: “If H doesn’t switch me off, then a must be good for H, and I’ll get to do it, so that’s good; if H does switch me off, then it’s because a must be bad for H, so it’s good that I won’t be allowed to do it.” 

AI to AI

A potential logical fallacy in the model might be that it focuses on AI executing a task which was devised by a human. But what if one AI interacts with another, or there is a chain of AI interactions? A hive-brain? A collective awareness? Could one AI warn the other that its human is trying to trick it by being vague about its goals on purpose? A scenario like this could be the result of the intelligence explosion Swedish researcher Nick Bostrom warns us against, when machines that are much smarter than us start designing other machines themselves. 

The suboptimal human

Another issue in the model is the suboptimal human. There is no such thing as a perfectly reasonable, logically consistent human. An example provided Russell’s article is a toddler sitting in a self-driving car. One cannot expect of the toddler to understand – or interact with – a problem in the car. The AI has to take this into account, which might imply it is logically more consistent than us. Does that make it smarter than us?

First time right

If we want to prevent AI from disabling its off-switch, we might have only one chance. At least, that is what some researchers state as one of the fundamental issues within the IT-control problem. But there are several side effects to shutting down an AI as well: it may be difficult to state the exact time or criteria for when a shutdown is required, let alone decide what is exactly to be shut down. A lot more research is required, resulting in foolproof protocols which then have to be implemented across the entire AI ecosphere. Do we have enough time to achieve this?

Legislation

In a draft report of the committee on Legal Affairs of the European Union, opt-out mechanisms (kill-switches) are specifically mentioned as constraints for robot designers: “You should integrate obvious opt-out mechanisms (kill switches) that should be consistent with reasonable design objectives.” (http://www.europarl.europa.eu/sides/getDoc.do?pubRef=-//EP//NONSGML%2BCOMPARL%2BPE-582.443%2B01%2BDOC%2BPDF%2BV0//EN, p18).

AI Safety

A new field of research which might provide a solution to the kill switch problem is called AI Safety. It is slowly emerging, sponsored by for example the Future of Life institute (https://futureoflife.org/ai-safety-research/). As stated by Leike et al (AI Safety Gridworlds, https://arxiv.org/pdf/1711.09883.pdf), however, “This nascent field of AI safety still lacks a general consensus on its research problems.”

Posted in AI