Gemini Robotics On-Device: Advancing Dexterity and Task Adaptation for Local Robotics

Google DeepMind's Gemini Robotics On-Device enables real-time, general-purpose dexterous manipulation and task adaptation directly on bi-arm robots without cloud dependency.

Google DeepMind has launched its latest advancement, Gemini Robotics On-Device, a vision-language-action (VLA) model optimized to run directly on robotic devices.

This on-device solution is designed to provide high-level general-purpose dexterity and the ability to quickly adapt to new tasks, all while operating locally without relying on external data networks. The model's ability to function efficiently in environments with intermittent or zero connectivity is particularly important for latency-sensitive applications.

Key Features and Performance

Gemini Robotics On-Device is designed for bi-arm robots and aims to provide exceptional task generalization and dexterous manipulation.

The model operates with minimal computational resources, yet it performs tasks such as folding clothes and unzipping bags with impressive accuracy. One of the standout features of the model is its low-latency inference, which enables real-time task completion directly on the robot.

The model also achieves strong generalization across various testing scenarios, demonstrating its ability to follow natural language instructions and complete complex, multi-step tasks. It outperforms other on-device models, especially in handling tasks that involve fine-tuning to new environments or out-of-distribution challenges.

Task Adaptation and Flexibility

Gemini Robotics On-Device allows developers to fine-tune the model to specific tasks with minimal input. The model requires only 50 to 100 demonstrations to adapt to new applications, showcasing its flexibility and ease of use. In tests involving multiple dexterous manipulation tasks, the model showed strong performance in tasks such as zipping lunchboxes and pouring salad dressing.

The model has also been adapted successfully across other robotic systems, such as the ALOHA, Franka FR3, and Apollo humanoid robots. This versatility indicates the model's effectiveness in collaborating with various embodiments and executing general-purpose tasks.

Also read: Gemini AI Now Enhances Google Docs on Android Devices

Safety and Responsible Development

In alignment with AI safety standards, Gemini Robotics On-Device integrates both semantic and physical safety measures. The model interacts with low-level safety-critical controllers to make sure tasks are carried out safely. The Responsible Development & Innovation (ReDI) team is engaged in ensuring the actual-world effect of the technology, that it fulfills societal demands and limits possible risks.

Related Topics

Foundation ModelsLarge Language Models (LLMs)