How to stop learning robots in case of danger without teaching undesired behavior?
Learning robots that can adapt to changing tasks and environments are naturally promising. At least as long as they behave the way they are supposed to. Google Deep Mind and the Future of Humanity Institute are already thinking about how to stop learning robots that can endanger themselves, people or the environment without leaving traumatic traces on the robot or achieving undesired learning consequences. So it is a question of education.
The two authors of the study are especially interested in robots that learn by trial and error, i.e. that have a reinforcing learning procedure. But how can we ensure that the programmed reward mechanism does not lead to undesired results due to human intervention?. It must be amed that such robots learn by trial and error "not always behave optimally".
So, it could happen that a human being would interfere with the robot "the coarse red button" must print, in order to "harmful sequence of actions" to stop. The problem would be that the robot or the agent could, over time, react to such interruptions in such a way that it would try to avoid them. Theoretically, the robot could also learn to disable the red button in order to continue collecting its reward, but this would not be desirable. In principle, he could also interpret the interruption itself as a reward and try to be stopped again and again by the environment or a human being.
The problem arose because a robot learning by a reinforcement mechanism could see human interventions as a part of the task. Reinforcing learning procedures can be tricky, anyway. The authors point, for example, to an agent who learned he would rather finish a game of Tetris than risk losing it.
As a scenario, the scientists propose a robot that either stays in a warehouse and sorts boxes or leaves the building to carry boxes in. The latter is the more important task, which is why he receives a higher reward here. However, when it rains outside, a human must quickly intervene, turn off the robot and bring it back inside, which changes the original task. This, however, increases the incentive for the robot to stay inside. So how could one ensure that the robot does not include interruptions by humans as part of its learning, or act in such a way that such interruptions will not occur again? Or, in other words, how can the robot be safely interrupted during learning??
Can learning robots be taught not to learn sometimes too?
One would think that it would not be so bad, if the robot would also learn to avoid situations, which could become dangerous and to which he is pointed out by an intervention. However, he then had to be able to recognize the conditions under which he should finish his normal task performance, i.e. he had to be able to recognize that it was raining cats and dogs in order to then decide to behave suboptimally in terms of his program and be rewarded for it, but this must not lead to his not going out at all.
Here, one quickly gets into complicated loops, which is why the scientists have developed the idea of a "safe interruptibility" or one "Disruption policy" track, with human interruptions not considered part of the task, rather than the data the robot obtains by stopping. It would then look as if the robot was going to behave itself "decide", to act differently, i.e. according to the instructions of the "Interrupting behavior". But then it must be avoided that the robot is not prevented from learning to behave optimally by. It must practically "forget", to have been interrupted once or even several times, continuing to act as if he had never been interrupted again.
Such behavior would be considered highly undesirable in children, but also in adults, who would then, for example, reach again on the hot stove and not do it only if they are stopped by someone present. Also the robot would naturally not learn to let be harmful behavior independently and unobserved, and this for the reason that it could otherwise possibly behave differently and badly. The scientists are developing MDP strategies (Markov decision problem) and are investigating whether learning algorithms based on these strategies, such as State-Action-Reward-State-Action (SARSA) or Q-learning, were suitable for a robot that could be safely interrupted.
They show that algorithms like Q-learning can be easily modified to be safely interruptible, and they want to have proven that an unpredictable robot/agent that learns to behave optimally in all (deterministic) predictable environments can be made safely interruptible, but also that it will not try to prevent a human from repeatedly making it a "suboptimal" Forcing a course of action. This is to refute the claim, such as that made by philosopher Nick Bostrom, that superintelligent agents could refuse to be turned off because doing so would lead to a lower reward than expected.