Train in simulation, deploy in real world (with real-time adaptation)

Why simulators for robot learning?

  • Most RL-based algos are very sample inefficient
  • They are cheap/fast/scalable/safe/labeled

Problems of Sim2Real

  • Non-parametric mismatches (simulator doesn’t consider some effects at all)
    • complex aerodynamics, fluid dynamics, tire dynamics, etc
  • Parametric mismatches (simulator uses different parameters than real)
    • robot mass/friction,etc

Domain Randomization

  • Randomize in
  • Train a single RL policy that works for the whole distribution of
    • Approximation of robust control
  • What is randomized?
    • Physics parameters (mass, gravity, friction, etc)
    • Sensor noise (camera blur, pixel noise, quantization, etc)
    • Rendering (lighting, textures, backgrounds)

Learning to Adapt (via Privileged Information)

  • Randomize in
  • Train an adaptive RL policy that works for many
    • approximation of adaptive control
  • Issue! is often unknown in real world
    • Solution! Learning from a privileged teacher
      • Sim: First Train a teacher policy with privileged information
      • Sim: Student policy learns from
      • Real: Deploy student policy
    • Basically becomes an Imitation Learning problem