On Friday, researchers from Nvidia, UPenn, Caltech, and the University of Texas at Austin announced Eureka, an algorithm that uses OpenAI's GPT-4 language model for designing training goals (called "reward functions") to enhance robot dexterity. The work aims to bridge the gap between high-level reasoning and low-level motor control, allowing robots to learn complex tasks rapidly using massively parallel simulations that run through trials simultaneously. According to the team, Eureka outperforms human-written reward functions by a substantial margin.
Before robots can interact with the real world successfully, they need to learn how to move their robot bodies to achieve goals—like picking up objects or moving. Instead of making a physical robot try and fail one task at a time to learn in a lab, researchers at Nvidia have been experimenting with using video game-like computer worlds (thanks to platforms called Isaac Sim and Isaac Gym) that simulate three-dimensional physics. These allow for massively parallel training sessions to take place in many virtual worlds at once, dramatically speeding up training time.
"Leveraging state-of-the-art GPU-accelerated simulation in Nvidia Isaac Gym," writes Nvidia on its demonstration page, "Eureka is able to quickly evaluate the quality of a large batch of reward candidates, enabling scalable search in the reward function space." They call it "rapid reward evaluation via massively parallel reinforcement learning."