Published in AI

British AI experts teach machines to team up and kill

by on03 June 2019

What could go wrong?

A British artificial intelligence company has designed AI agents which have taught themselves Quake III Arena, and became so good they consistently beat human beings.

DeepMind researchers have written a paper published in Science and claim it is the first time any machine has managed it.

While humans have been toast in one-on-one turn-based games such as chess ever since IBM's Deep Blue beat Gary Kasparov in 1997, games like Quake have been much harder for AI.

Multiplayer games involving teamwork and interaction in complex environments seemed surmountable.

The team led by Max Jaderberg worked on a modified version of Quake III Arena, a game that first appeared in 1999.

Using the "Capture the Flag" game mode, which involves working with teammates to grab the opponent team's flag while safeguarding your own, was set at the challenge.

It forces players to devise complex strategies mixing aggression and defense.

After the AI agents had been given time to train themselves up, they matched up their prowess against professional games testers.

It did not take long. After 12 hours of practice, the human game testers were only able to win a quarter of the games against the agent team.

The agents' win-loss ratio remained superior even when their reaction times were artificially slowed down to human levels and when their aiming ability was reduced.

Behind the success was something called "Reinforcement Learning" (RL) to train.

"Initially, they knew nothing about the world and instead were doing completely random stuff and bouncing about the place", Jaderberg told AFP.

"One of the contributions of the paper is each agent learns its own internal reward signal", said Jaderbeg, meaning that the AI players decided for themselves how much weight to assign the successful completion of tasks like capturing the flag or hitting an opponent.

Training a population of agents together, rather than one at a time, made the population learn much faster.

The researchers devised a new architecture of so-called "two timescale" learning, which Jaderberg likened to the thesis of the book "Thinking Fast and Slow," but for AI.

"You have one part of the agent which kicks very quickly, it updates its own beliefs very quickly, and you have another part of the agent, which updated belief at a slower rate, and these two beliefs influence each other and help shape the way the agent learns about the world", he said.

Randomising the map for each new match was key as it meant that the AI’s had to learn and not just memorize a sequence of actions.

The downside is that they are training AI robots how to kill and while that is OK in a computer game it is less friendly if one of them gets their paws on nuclear triggers.

Last modified on 03 June 2019
Rate this item
(0 votes)

Read more about: