CSC 498 Fall 2021: Introduction to Reinforcement Learning

Weekly schedule

Lectures: online delivery, Tues 5:00 pm - 7:00 pm EST, Zoom

Tutorials: Fri 09:00 am - 10:00 am EST, Zoom

Animesh Garg office hours: Thurs 2:30 pm - 3:30 pm EST, Zoom
Mail: garg@cs.toronto.edu

TA office hours: Thurs 10:00 am - 12:00 am EST, Zoom
Mail: matthew.zhang@mail.utoronto.ca
Mail: c.voelcker@mail.utoronto.ca
Mail: lichothu.wang@mail.utoronto.ca

All Emails Subject: “[CSC498-F21] ”

Accessing resources

Piazza: piazza Zoom: Link in Quercus Announcement

Online delivery: The lectures will be delivered live online in the lecture slot. During the Friday tutorial slot, we will have a small quiz every week (mandatory attendance) and discuss the material and exercises. For questions about the material or exercises, join the office hours or participate in the online offerings on Zoom.

Description

Reinforcement learning is a powerful paradigm for modeling autonomous and intelligent agents interacting with the environment, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare. This course provides an introduction to reinforcement learning intelligence, which focuses on the study and design of agents that interact with a complex, uncertain world to achieve a goal. We will study agents that can make near-optimal decisions in a timely manner with incomplete information and limited computational resources. The course will cover Markov decision processes, reinforcement learning, planning, and function approximation (online supervised learning). The course will take an information-processing approach to the concept of mind and briefly touch on perspectives from psychology, neuroscience, and philosophy.

Learning objectives

At the end of this course, you will have gained both knowledge and system building abilities in:

Define the key features of reinforcement learning that distinguishes it from AI and non-interactive machine learning (as assessed by the exam).
Given an application problem (e.g. from computer vision, robotics, etc), decide if it should be formulated as a RL problem; if yes be able to define it formally (in terms of the state space, action space, dynamics and reward model), state what algorithm (from class) is best suited for addressing it and justify your answer (as assessed by the project and the exam).
Implement in code common RL algorithms (as assessed by the homeworks).
Describe (list and define) multiple criteria for analyzing RL algorithms and evaluate algorithms on these metrics: e.g. regret, sample complexity, computational complexity, empirical performance, convergence, etc (as assessed by homeworks and the exam).
Describe the exploration vs exploitation challenge and compare and contrast at least two approaches for addressing this challenge (in terms of performance, scalability, complexity of implementation, and theoretical guarantees) (as assessed by an assignment and the exam).

List of Topics covered in this course (expected)

With a focus on AI as the design of agents learning from experience to predict and control their environment, topics will include

Markov decision processes
Planning by approximate dynamic programming
Monte Carlo and Temporal Difference Learning for prediction
Monte Carlo, Sarsa, and Q-learning for control
Dyna and planning with a learned model
Prediction and control with function approximation
Policy gradient methods

Homework

Homework 1: due 2021-10-21
- assignment sheet
- assignment files
Homework 2: due 2021-11-04 (postponed to Nov 8th!)
- assignment sheet
- assignment files
Homework 3: due 2021-11-17
- assignment sheet
- assignment files
Homework 4: due 2021-12-03
- assignment sheet
- assignment files

Project

Project requirements sheet
Project proposal template
Project abstract deadline: on Quercus
Project final deadline: tbd
Project presentation: 2021-12-07

Exam

Exam: tbd

Prerequisites

Priority will be given to students who meet prerequisites for the course. Knowledge of probability, multivariate calculus, and linear algebra is expected.

Required:

Intro to ML (CSC 311) or Intro to AI () or equivalent.
CSC 209, MAT223, MAT232 and STA256

Recommended:

CSC375, MAT224 and related

Algorithm implementation will be done mainly in Python. Please familiarize yourself with the language and common tools (git, cmd, and the frameworks numpy and pytorch).

Textbook & Resources

There is no required textbook.The course will provide all material in class and handouts.
The students can refer the following material for additional help:

A great introductory text on reinforcement learning: Sutton and Barto, Reinforcement Learning

Additional resources:

If you need a refresher on common mathematical tools and tricks for ML: Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, Mathematics for Machine Learning
If you want to learn more about convex optimization, I recommend: Stephen Boyd's EE364a: Convex Optimization I and Stephen Boyd's EE364b: Convex Optimization II. Both of them have all course materials, including lecture videos, available online.

Reinforcement learning ressources:

RL Course from UW - Byron Boots (https://homes.cs.washington.edu/~bboots/RL-Fall2020/)
RL Course from Stanford - Emma Brunskill (http://web.stanford.edu/class/cs234/index.html)
RL Course from University of Alberta - Martha White (https://marthawhite.github.io/rlcourse/schedule.html)
RL course at ASU/MIT, Dimitry Bertsekas - (http://web.mit.edu/dimitrib/www/RLbook.html )
David Silver's course on Reinforcement Learning (http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html)
Deep RL Course from Berkeley - Sergey Levine (http://rail.eecs.berkeley.edu/deeprlcourse/resources/#courses)

Evaluation format

This course combines lectures with Tutorials, encouraging both fundamental knowledge acquisition as well as hands-on experience. Each student will be responsible for 4 individual assignments (40%), one take-home midterm (20%) and one project (20%). In addition, we will conduct 8 short online quizzes during the exercise slot on Fridays, the 4 best ones will count (20%). The quiz dates will be announced at least one week prior. FOr more information, see the syllabus.

Late penalties

Each student will have 3 grace days throughout the semester for late assignment submissions. Late submissions that exceed those grace days will lose 33% of their value for every late day beyond the allotted grace days. Late submissions that exceed three days of delay after the grace days have been used will unfortunately not be accepted. The official policy of the Registrar’s Office at UTM regarding missed exams can be found here.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search