Train in simulation, deploy in real world (with real-time adaptation)
Why simulators for robot learning?
- most RL-based algos are very sample inefficient
- They are cheap/fast/scalable
Problems of Sim2Real
- non-parametric mismatches (simulator doesn’t consider some effects at all)
- complex aerodynamics, fluid dynamics, tire dynamics, etc
- Parametric mismatches (simulator uses different parameters than real)
- robot mass/friction,etc
Domain Randomization
- Randomize in
- Train a single RL policy that works for many
- Approximation of robust control
Learning to Adapt
- Randomize in
- Train an adaptive RL policy that works for many
- approximation of adaptive control
- Issue! is often unknown in real
- Solution! Learning from a privileged teacher
- Sim: First Train a teacher policy with privileged information
- Sim: Student policy learns from
- Real: Deploy student policy
- Basically an Imitation Learning problem