Published in News

Microsoft’s AI plays a mean Ms Pac-Man

by on16 June 2017


AI has evolved to the level of a 1980s arcade kiddie


Microsoft's artificial intelligence system has mastered the 1980s video game Ms. Pac-Man, making its steps towards global domination that much closer.

The team, from Microsoft-owned Canadian AI firm Maluuba, achieved the perfect score of 999,990.

Vole said that the method used in the game could also be used for teaching AI agents to perform complex tasks to help humans. Google's DeepMind AI, which has beaten the complex game of Go, is widely seen as leading the pack on AI research.

Doina Precup, an associate professor of computer science at McGill University in Montreal, said Microsoft's win was a significant achievement.

"Lots of firms experimenting with AI test their system using video games, but Ms. Pac-Man has been among the most difficult to crack," she said.

Writing in its bog, Microsoft said that the team used an AI technique known as reinforcement learning to master the Atari 2600 version of the game. To achieve the high score, the team divided the problem into small pieces which were distributed among AI agents.

The system used more than 150 agents, each of which worked in parallel with other agents to master the game. Some were rewarded for successfully finding one specific pellet, while others were tasked with staying out of the way of ghosts.

Then the researchers created a "senior manager" agent which took suggestions from all the others and used them to decide where to move Ms. Pac-Man.

Its decision-making was complex so, for example, if 100 agents wanted to go right because that was the best path to their pellet, but three wanted to go left because there was a deadly ghost on the right, it would give more weight to the ones who had noticed the ghost.

Harm Van Seijen, a research manager with Maluuba, said the best results were achieved when each agent acted egotistically while the top agent considered the best move for everyone.

"There's this nice interplay between how they have to, on the one hand, co-operate based on the preferences of all the agents, but at the same time each agent cares only about one particular problem," he said. He has published a paper about the technique - known as Hybrid Reward Architecture - which has yet to be peer-reviewed.

Last modified on 16 June 2017
Rate this item
(0 votes)

Read more about: