Motivation

Instead training robots to learn how to do a specific tasks by collecting tons of specific data (much like basic ML tasks from 10 years ago), we can follow the trend of foundation models and collect tons of different data with different skills, robots, etc and try to get these robots to generalize!

Challenges in Robotics

Poor Generalization: Current robot algorithms are pretty bad at generalizing to new robotic hardware, environments; hyperparameters and algorithms are usually hand-crafted for a specific task or environment
Data Scarcity: Data is essential for deep learning-based methods, but the huge range of tasks and environments makes it hard to create large-scale datasets. Simulation is one of the leading ways to help solve this, where synthetic data and techniques like domain randomization, help improve generalization and faster training.
Task specification: It is hard for users to exactly specify the task, we want robots to understand things semantically.
Uncertainty and safety: Inherent uncertainty in the environments and task specifications

Foundation Models used in Robotics

The most apparent application is for object recognition and scene understanding, extracting semantic information from the VLM and spatial information from objects/scenes that the robots interact with. Another application is for state estimation and localization, using CLIP and GPT to detect objects and generate labels, helping with SLAM.

Task planning and action generation is another obvious direction. For example, SayCan, uses LLMs… Reward Generation: Language to Rewards for Robotic Skill Synthesis

RT-2: A vision-language-action (VLA) models

$π_{0}$ generalist policy

Takes internet-scale pre-training data + Open X-embodiment dataset + collected task data, and combines that with a pre-trained VLM (PaliGemma), with another flow-matching model (action expert), to produce continuous actions.
Model can then be fine-tuned on downstream tasks

Sequential Reasoning for VLAs

VLA with Embodied Chain-Of-Thought

Robotic Foundation Models

Offline RL most methods trained a visual policy in the simulation then transferred the model to real world

Other Interesting Directions

Improving Simulations and Sim2Real
LLM for Reward Design with Eureka
Doing imitation learning
No simulator. Collect data from real → learn a model → design a policy → deploy
Meta-learned dynamics model + online adaptive control

References

JeffreyYH/Awesome-Generalist-Robots-via-Foundation-Models: Paper list in the survey paper: Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

https://www.youtube.com/watch?v=EYLdC3a0NHw

Brayden Zhang

Explorer

Foundation Models for Robotics

Challenges in Robotics

Foundation Models used in Robotics

$π_{0}$ generalist policy

Sequential Reasoning for VLAs

Robotic Foundation Models

Other Interesting Directions

References

Table of Contents

Graph View

Backlinks

Brayden Zhang

Explorer

Foundation Models for Robotics

Challenges in Robotics

Foundation Models used in Robotics

π0​ generalist policy

Sequential Reasoning for VLAs

Robotic Foundation Models

Other Interesting Directions

References

Table of Contents

Graph View

Backlinks

$π_{0}$ generalist policy