Train in simulation, deploy in real world (with real-time adaptation)
Why simulators for robot learning?
- Most RL-based algos are very sample inefficient
- They are cheap/fast/scalable/safe/labeled
Problems of Sim2Real
- Non-parametric mismatches (simulator doesn’t consider some effects at all)
- complex aerodynamics, fluid dynamics, tire dynamics, etc
- Parametric mismatches (simulator uses different parameters than real)
- robot mass/friction,etc
Domain Randomization
- Randomize in
- Train a single RL policy that works for the whole distribution of
- Approximation of robust control
- What is randomized?
- Physics parameters (mass, gravity, friction, etc)
- Sensor noise (camera blur, pixel noise, quantization, etc)
- Rendering (lighting, textures, backgrounds)
Learning to Adapt (via Privileged Information)
- Randomize in
- Train an adaptive RL policy that works for many
- approximation of adaptive control
- Issue! is often unknown in real world
- Solution! Learning from a privileged teacher
- Sim: First Train a teacher policy with privileged information
- Sim: Student policy learns from
- Real: Deploy student policy
- Basically becomes an Imitation Learning problem