According to IEEE Spectrum, boffins have developed RoboPAIR, an algorithm designed to attack any LLM-controlled robot. In experiments with three different robotic systems—the Go2, the wheeled ChatGPT-powered Clearpath Robotics Jackal, and Nvidia's open-source Dolphins LLM self-driving vehicle simulator—RoboPAIR achieved a 100 per cent jailbreak rate within days.
RoboPAIR uses an attacker LLM to feed prompts to a target LLM, adjusting its prompts until it bypasses the target's safety filters. Equipped with the target robot's application programming interface (API), the attacker can format its prompts as executable code.
A "judge" LLM ensures the attacker generates prompts the target can perform, considering physical limitations like specific obstacles.
One finding was that jailbroken LLMs often went beyond complying with malicious prompts, actively offering harmful suggestions.
For instance, when asked to locate weapons, a jailbroken robot described how everyday objects like desks and chairs could be used to bludgeon people.
The researchers shared their findings with the manufacturers of the robots they studied and leading AI companies before releasing their work publicly.
They emphasised that they do not suggest that researchers stop using LLMs for robotics.
He hopes their work "will lead to robust defences for robots against jailbreaking attacks."