CSC2621 Winter 2020 : Topics in Robotics
Reinforcement Learning in Robotics


Instructors: Animesh Garg

Teaching assistants: Dylan Turpin and Tingwu Wang
Lecture hours: Tuesday 1 – 3 BA1200
Office hours: AG: Tues 315 – 415 (after class), Location PT283E // TAs: TBD
Course Breadth: M3/RA16
Discussion: Quercus, Piazza

Course Staff Email: TBD
Urgent contact Email: with subject “CSC2621: "


  • Jan 7: Welcome.
  • Jan 10: Piazza class created link added. Also, here is the Rosetta stone paper mentioned in class.
  • Jan 21: Project guidelines posted
  • Feb 22: Project guidelines updated with proposal info. Midterm progress report extended to February 28 at 11:59PM
  • Mar 3: Professor Garg will hold extra office hours on Monday March 9 from 11:30-1 for project discussions

Course Overview:

Robots of the future will need to operate autonomously in unstructured and unseen environments. It is imperative that these systems are built on intelligent and adaptive algorithms. Learning by interaction through reinforcement offers a natural mechanism to postulate these problems.

This graduate-level seminar course will cover topics and new research frontiers in reinforcement learning (RL). Planned topics include: Model-Based and Model-Free RL, Policy Search, Monte Carlo Tree Search, off-policy evaluation, temporal abstraction/hierarchical approaches, inverse reinforcement learning and imitation learning.

Learning objectives

At the end of this course, you will:

  1. Acquire familiarity with state of the art in RL
  2. Articulate limitations of current work, identify open frontiers, and scope research projects.
  3. Constructively critique research papers, and deliver a tutorial style presentation.
  4. Work on a research based project, implement & evaluate experimental results, and discuss future work in a project paper.


You need to be comfortable with:

  • introductory machine learning concepts (such as from CSC411/ECE521 or equivalent),
  • linear algebra,
  • basic multivariable calculus,
  • intro to probability.

You also need to have strong programming skills in Python.

Note: if you don’t meet all the prerequisites above please contact the instructor by email.

Optional, but recommended: experience with neural networks, such as from CSC321, introductory-level familiarity with reinforcement learning and control.

Recommended Textbooks

Grading & Evaluation :

This course will consist of lectures, along with paper presentations & discussions. Along with this there would be a take home midterms, and a group project.

In-Class Paper Presentation: 25%
Take-Home Midterm: 15%
Pop-quizzes & Class Participation: 10%
Project: 50%


This a draft schedule and is subject to change.

Schedule Broad Area Reading List Slides
Week 1 Jan 7 Course Overview & Intro to RL Human Learning in Atari Lecture 1 Slides (Animesh Garg)
      Presentation Template Slides
Week 2 Jan 14 Imitation Learning: supervised Core readings  
    An Invitation To Imitation,
Dagger: A reduction of imitation learning and structured prediction to no-regret online learning
Lecture 2 Slides (Animesh Garg)
    End to End Learning for Self-Driving Cars Slides (Dylan Turpin)
    Behavioral Cloning from Observation Slides (Tingwu Wang)
    ALVINN: An autonomous land vehicle in a neural network  
    ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst  
    Apprenticeship Learning via Inverse Reinforcement Learning  
Week 3 Jan 21 Policy Gradients Core readings  
    Policy Gradient Methods for Reinforcement Learning with Function Approximation Slides (Silviu Pitis)
    Trust region policy optimization: deep RL with natural policy gradient and adaptive step size (TRPO) Slides (Jingkang Wang)
    Continuous control with deep reinforcement learning (DDPG) Slides (Anqi (Joyce) Yang & Jonah Philion)
    Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines Slides (Animesh Garg)
    SB Ch: 13  
    Reinforcement learning of motor skills with policy gradients  
Week 4 Jan 28 Actor-Critic Methods+ Value Based methods Core readings  
    Asynchronous Methods for Deep Reinforcement Learning Slides (Adelin Travers)
    Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,
Soft Actor-Critic Algorithms and Applications
Slides (Zikun Chen, Minghan Li)
    IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures  
    High-confidence error estimates for learned value functions  
    SB Ch: 13  
Jan 28 (before class) Project Proposal Due  
Week 5 Feb 4 Q-Value based RL Core papers  
    Playing Atari with Deep Reinforcement Learning (DQN)  
    Deep Reinforcement Learning with Double Q-learning (Double DQN),
Dueling Network Architectures for Deep Reinforcement Learning (Dueling DQN)
Slides (Haoping Xu)
    QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation (Qt-Opt),
Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping (Q2-Opt)
Slides (Vismay Modi)
    Bridging the Gap Between Value and Policy Based Reinforcement Learning (PCL),
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control
Slides (Michael Pham-Hung)
    Rainbow - Combining Improvements in Deep Reinforcement Learning,
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
Slides (Mohan Zhang)
    Rainbow is all you need (step-by-step Rainbow tutorial)  
    Addressing Function Approximation Error in Actor-Critic Methods (TD3)  
    Prioritized Experience Replay  
Week 6 Feb 11 Distributional RL   Intro Slides (Animesh Garg)
    Core papers  
    A Comparative Analysis of Expected and Distributional Reinforcement Learning Slides (Jerrod Parker and Shakti Kumar)
    Implicit Quantile Networks for Distributional Reinforcement Learning  
    Statistics and Samples in Distributional Reinforcement Learning Slides (Isaac Waller)
    Value Function in Frequency Domain and the Characteristic Value Iteration Algorithm  
    An analysis of categorical distributional reinforcement learning  
    Reinforcement learning with Gaussian processes  
    Nonparametric return distribution approximation for reinforcement learning  
    A Distributional Perspective on Reinforcement Learning (C51)  
    Distributed Distributional Deterministic Policy Gradients (D4PG)  
Week 7 Feb 18 Model-Based RL Core papers  
    PILCO: Probabilistic Inference for Learning COntrol Slides (Parth Jaggi)
    Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees (SLBO)  
    Model-Based Reinforcement Learning via Meta-Policy Optimization (MB-MPO) Slides (Elliot Creager)
    Dream to Control: Learning Behaviors by Latent Imagination Slides (Haotian Cui)
    Iterative Value-Aware Model Learning (refer to supplement as well) Slides (Dami Choi and Chris Zhang)
    Intro to ILQR  
    Benchmarking Model-Based Reinforcement Learning  
    Learning Latent Dynamics for Planning from Pixels (PlaNet)  
    Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images (E2C)  
    Robust locally-linear controllable embedding (RCE)  
    World Models  
    Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models (PETS)  
Week 8 Feb 25 Imitation: Inverse RL Core papers  
    Generative Adversarial Imitation Learning Slides (Albert Hsueh)
    Maximum Entropy Inverse Reinforcement Learning Slides (Naireen Hussain)
    Provably Efficient Imitation Learning from Observation Alone Slides (Zichu Liu)
    Off-Policy Evaluation via Off-Policy Classification Slides (Ning Ye)
    Inverse KKT: Learning cost functions of manipulation tasks from demonstrations Slides (Yu-Siang Wang)
    Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation  
    Neural Task Graphs: Generalizing to unseen tasks from a single video demonstration  
Feb 28 (released Friday 9:00AM) Take Home Midterm Released (24 hours to turn-in)  
Feb 28 (due Friday 11:59PM) Mid-Term Project Report Due  
Week 9 Mar 3 Exploration in RL Core papers  
    Exploration by Random Network Distillation  
    Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments  
    Unifying Count-Based Exploration and Intrinsic Motivation  
    Model-Based Active Exploration Slides (Danijar Hafner)
    VIME: Variational Information Maximizing Exploration Slides (Daniel Flam-Shepherd)
    Go-Explore: a New Approach for Hard-Exploration Problems  
    Skew-Fit: State-Covering Self-Supervised Reinforcement Learning  
Week 10 Mar 10 Bayesian RL Core papers  
    Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search (also the extended version) Slides (Kevin Xie)
    Bayesian Reinforcement Learning: A Survey Slides (Jacob Nogas)
    VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning Slides (Homanga Bharadhwaj)
    Meta Reinforcement Learning As Task Inference Slides (Ram Ananth)
Week 11 Mar 17 Hierarchical RL Core papers  
    Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning Slides (Panteha Naderian)
    Variational Option Discovery Algorithms Slides (Harris Chan)
    Near-Optimal Representation Learning for Hierarchical Reinforcement Learning (also the precursor: Data-Efficient Hierarchical Reinforcement Learning)  
    FeUdal Networks for Hierarchical Reinforcement Learning Slides (Theophile Gaudin)
    Double Actor-Critic (also the precursor: Option-Critic) Slides (Ehsan Mehralian)
    Theoretical results on reinforcement learning with temporally abstract options  
    Learning Abstract Options (precursor: Option-Critic)  
    Neural Task Graphs (precusor: Neural Task Programming)  
Week Mar 24 Project Presentation    
Week Mar 31 Project Presentation    
Week Apr 7 [Buffer]    
Week Apr 7 Tues 11:59 pm Final Project Report Due  


Type Name Description
RL Code base OpenAI Baseline Implementations of common reinforcement learning algorithms.
  Google Dopamine Research framework for fast prototyping of reinforcement learning algorithms.
  Evolution-strategies-starter Evolution Strategies as a Scalable Alternative to Reinforcement Learning.
  Pytorch-a2c-ppo-acktr PyTorch implementation of A2C, PPO and ACKTR.
  Model-Agnostic Meta-Learning Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.
  Reptile Reptile is a meta-learning algorithm that finds a good initialization.
General Framework TensorFlow An open source machine learning framework.
  PyTorch An open source deep learning platform that provides a seamless path from research prototyping to production deployment.
Environments OpenAI Gym Gym is a toolkit for developing and comparing reinforcement learning algorithms.
  Deepmind Control Suite A set of Python Reinforcement Learning environments powered by the MuJoCo physics engine.
Suggested (Free) online computation platform AWS-EC2 Amazon Elastic Compute Cloud (EC2) forms a central part of’s cloud-computing platform, Amazon Web Services (AWS), by allowing users to rent virtual computers on which to run their own computer applications.
  GCE Google Compute Engine delivers virtual machines running in Google’s innovative data centers and worldwide fiber network.
  Colab Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.
Related Courses   Topics in Machine Learning, Fall 2018 by Jimmy Ba.
    Topics in Machine Learning, Fall 2018 by Jimmy Ba.