CSC2621 Topics in Robotics - Reinforcement Learning (Winter 2020)

Course:

Instructors: Animesh Garg
Webpage: https://pairlab.github.io/csc2621-w20/#

Teaching assistants: Dylan Turpin and Tingwu Wang
Lecture hours: Tuesday 1 – 3 BA1200
Office hours: AG: Tues 315 – 415 (after class), Location PT283E // TAs: TBD
Course Breadth: M3/RA16
Discussion: Quercus, Piazza

Course Staff Email: TBD
Urgent contact Email: garg@cs.toronto.edu with subject “CSC2621: "

Announcements:

Jan 7: Welcome.
Jan 10: Piazza class created link added. Also, here is the Rosetta stone paper mentioned in class.
Jan 21: Project guidelines posted
Feb 22: Project guidelines updated with proposal info. Midterm progress report extended to February 28 at 11:59PM
Mar 3: Professor Garg will hold extra office hours on Monday March 9 from 11:30-1 for project discussions

Course Overview:

Description
Robots of the future will need to operate autonomously in unstructured and unseen environments. It is imperative that these systems are built on intelligent and adaptive algorithms. Learning by interaction through reinforcement offers a natural mechanism to postulate these problems.

This graduate-level seminar course will cover topics and new research frontiers in reinforcement learning (RL). Planned topics include: Model-Based and Model-Free RL, Policy Search, Monte Carlo Tree Search, off-policy evaluation, temporal abstraction/hierarchical approaches, inverse reinforcement learning and imitation learning.

Learning objectives

At the end of this course, you will:

Acquire familiarity with state of the art in RL
Articulate limitations of current work, identify open frontiers, and scope research projects.
Constructively critique research papers, and deliver a tutorial style presentation.
Work on a research based project, implement & evaluate experimental results, and discuss future work in a project paper.

Prerequisites

You need to be comfortable with:

introductory machine learning concepts (such as from CSC411/ECE521 or equivalent),
linear algebra,
basic multivariable calculus,
intro to probability.

You also need to have strong programming skills in Python.

Note: if you don’t meet all the prerequisites above please contact the instructor by email.

Optional, but recommended: experience with neural networks, such as from CSC321, introductory-level familiarity with reinforcement learning and control.

Recommended Textbooks

Marco Wiering and Martijn van Otterlo, Eds., Reinforcement Learning: State-of-the-Art, Springer, 2012. Available for free under UofT library subscription. Install proxy bookmark, then visit book page, login with UofT credentials.
Sutton and Barto’s 2018 updated edition.

Grading & Evaluation :

This course will consist of lectures, along with paper presentations & discussions. Along with this there would be a take home midterms, and a group project.

In-Class Paper Presentation: 25%
Take-Home Midterm: 15%
Pop-quizzes & Class Participation: 10%
Project: 50%

Calendar:

This a draft schedule and is subject to change.

Schedule	Broad Area	Reading List	Slides
Week 1 Jan 7	Course Overview & Intro to RL	Human Learning in Atari	Lecture 1 Slides (Animesh Garg)
			Presentation Template Slides
Week 2 Jan 14	Imitation Learning: supervised	Core readings
		An Invitation To Imitation, Dagger: A reduction of imitation learning and structured prediction to no-regret online learning	Lecture 2 Slides (Animesh Garg)
		End to End Learning for Self-Driving Cars	Slides (Dylan Turpin)
		Behavioral Cloning from Observation	Slides (Tingwu Wang)
		Optional
		ALVINN: An autonomous land vehicle in a neural network
		ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst
		Apprenticeship Learning via Inverse Reinforcement Learning
Week 3 Jan 21	Policy Gradients	Core readings
		Policy Gradient Methods for Reinforcement Learning with Function Approximation	Slides (Silviu Pitis)
		Trust region policy optimization: deep RL with natural policy gradient and adaptive step size (TRPO)	Slides (Jingkang Wang)
		Continuous control with deep reinforcement learning (DDPG)	Slides (Anqi (Joyce) Yang & Jonah Philion)
		Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines	Slides (Animesh Garg)
		Optional
		SB Ch: 13
		Reinforcement learning of motor skills with policy gradients
Week 4 Jan 28	Actor-Critic Methods+ Value Based methods	Core readings
		Asynchronous Methods for Deep Reinforcement Learning	Slides (Adelin Travers)
		Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Soft Actor-Critic Algorithms and Applications	Slides (Zikun Chen, Minghan Li)
		IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
		High-confidence error estimates for learned value functions
		Optional
		SB Ch: 13
Jan 28	(before class)	Project Proposal Due
Week 5 Feb 4	Q-Value based RL	Core papers
		Playing Atari with Deep Reinforcement Learning (DQN)
		Deep Reinforcement Learning with Double Q-learning (Double DQN), Dueling Network Architectures for Deep Reinforcement Learning (Dueling DQN)	Slides (Haoping Xu)
		QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation (Qt-Opt), Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping (Q2-Opt)	Slides (Vismay Modi)
		Bridging the Gap Between Value and Policy Based Reinforcement Learning (PCL), Trust-PCL: An Off-Policy Trust Region Method for Continuous Control	Slides (Michael Pham-Hung)
		Rainbow - Combining Improvements in Deep Reinforcement Learning, IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures	Slides (Mohan Zhang)
		Optional
		Rainbow is all you need (step-by-step Rainbow tutorial)
		Addressing Function Approximation Error in Actor-Critic Methods (TD3)
		Prioritized Experience Replay
Week 6 Feb 11	Distributional RL		Intro Slides (Animesh Garg)
		Core papers
		A Comparative Analysis of Expected and Distributional Reinforcement Learning	Slides (Jerrod Parker and Shakti Kumar)
		Implicit Quantile Networks for Distributional Reinforcement Learning
		Statistics and Samples in Distributional Reinforcement Learning	Slides (Isaac Waller)
		Value Function in Frequency Domain and the Characteristic Value Iteration Algorithm
		An analysis of categorical distributional reinforcement learning
		Optional
		Reinforcement learning with Gaussian processes
		Nonparametric return distribution approximation for reinforcement learning
		A Distributional Perspective on Reinforcement Learning (C51)
		Distributed Distributional Deterministic Policy Gradients (D4PG)
Week 7 Feb 18	Model-Based RL	Core papers
		PILCO: Probabilistic Inference for Learning COntrol	Slides (Parth Jaggi)
		Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees (SLBO)
		Model-Based Reinforcement Learning via Meta-Policy Optimization (MB-MPO)	Slides (Elliot Creager)
		Dream to Control: Learning Behaviors by Latent Imagination	Slides (Haotian Cui)
		Iterative Value-Aware Model Learning (refer to supplement as well)	Slides (Dami Choi and Chris Zhang)
		Intro to ILQR
		Optional
		Benchmarking Model-Based Reinforcement Learning
		Learning Latent Dynamics for Planning from Pixels (PlaNet)
		Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images (E2C)
		Robust locally-linear controllable embedding (RCE)
		World Models
		Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models (PETS)
Week 8 Feb 25	Imitation: Inverse RL	Core papers
		Generative Adversarial Imitation Learning	Slides (Albert Hsueh)
		Maximum Entropy Inverse Reinforcement Learning	Slides (Naireen Hussain)
		Provably Efficient Imitation Learning from Observation Alone	Slides (Zichu Liu)
		Off-Policy Evaluation via Off-Policy Classification	Slides (Ning Ye)
		Inverse KKT: Learning cost functions of manipulation tasks from demonstrations	Slides (Yu-Siang Wang)
		Optional
		Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
		Neural Task Graphs: Generalizing to unseen tasks from a single video demonstration
Feb 28	(released Friday 9:00AM)	Take Home Midterm Released (24 hours to turn-in)
Feb 28	(due Friday 11:59PM)	Mid-Term Project Report Due
Week 9 Mar 3	Exploration in RL	Core papers
		Exploration by Random Network Distillation
		Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments
		Unifying Count-Based Exploration and Intrinsic Motivation
		Model-Based Active Exploration	Slides (Danijar Hafner)
		VIME: Variational Information Maximizing Exploration	Slides (Daniel Flam-Shepherd)
		Optional
		Go-Explore: a New Approach for Hard-Exploration Problems
		Skew-Fit: State-Covering Self-Supervised Reinforcement Learning
Week 10 Mar 10	Bayesian RL	Core papers
		Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search (also the extended version)	Slides (Kevin Xie)
		Bayesian Reinforcement Learning: A Survey	Slides (Jacob Nogas)
		VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning	Slides (Homanga Bharadhwaj)
		Meta Reinforcement Learning As Task Inference	Slides (Ram Ananth)
Week 11 Mar 17	Hierarchical RL	Core papers
		Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning	Slides (Panteha Naderian)
		Variational Option Discovery Algorithms	Slides (Harris Chan)
		Near-Optimal Representation Learning for Hierarchical Reinforcement Learning (also the precursor: Data-Efficient Hierarchical Reinforcement Learning)
		FeUdal Networks for Hierarchical Reinforcement Learning	Slides (Theophile Gaudin)
		Double Actor-Critic (also the precursor: Option-Critic)	Slides (Ehsan Mehralian)
		Optional
		Theoretical results on reinforcement learning with temporally abstract options
		Learning Abstract Options (precursor: Option-Critic)
		Neural Task Graphs (precusor: Neural Task Programming)
Week Mar 24	Project Presentation
Week Mar 31	Project Presentation
Week Apr 7	[Buffer]
Week Apr 7	Tues 11:59 pm	Final Project Report Due

Resources:

Type	Name	Description
RL Code base	OpenAI Baseline	Implementations of common reinforcement learning algorithms.
	Google Dopamine	Research framework for fast prototyping of reinforcement learning algorithms.
	Evolution-strategies-starter	Evolution Strategies as a Scalable Alternative to Reinforcement Learning.
	Pytorch-a2c-ppo-acktr	PyTorch implementation of A2C, PPO and ACKTR.
	Model-Agnostic Meta-Learning	Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.
	Reptile	Reptile is a meta-learning algorithm that finds a good initialization.

General Framework	TensorFlow	An open source machine learning framework.
	PyTorch	An open source deep learning platform that provides a seamless path from research prototyping to production deployment.

Environments	OpenAI Gym	Gym is a toolkit for developing and comparing reinforcement learning algorithms.
	Deepmind Control Suite	A set of Python Reinforcement Learning environments powered by the MuJoCo physics engine.

Suggested (Free) online computation platform	AWS-EC2	Amazon Elastic Compute Cloud (EC2) forms a central part of Amazon.com’s cloud-computing platform, Amazon Web Services (AWS), by allowing users to rent virtual computers on which to run their own computer applications.
	GCE	Google Compute Engine delivers virtual machines running in Google’s innovative data centers and worldwide fiber network.
	Colab	Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

Related Courses		Topics in Machine Learning, Fall 2018 by Jimmy Ba.
		Topics in Machine Learning, Fall 2018 by Jimmy Ba.