CSC 498 Fall 2021: Introduction to Reinforcement Learning
Weekly schedule
Lectures: online delivery, Tues 5:00 pm - 7:00 pm EST, Zoom
Tutorials: Fri 09:00 am - 10:00 am EST, Zoom
Animesh Garg office hours: Thurs 2:30 pm - 3:30 pm EST, Zoom
Mail: garg@cs.toronto.edu
TA office hours: Thurs 10:00 am - 12:00 am EST, Zoom
Mail: matthew.zhang@mail.utoronto.ca
Mail: c.voelcker@mail.utoronto.ca
Mail: lichothu.wang@mail.utoronto.ca
All Emails Subject: “[CSC498-F21]
Accessing resources
Piazza: piazza Zoom: Link in Quercus Announcement
Online delivery: The lectures will be delivered live online in the lecture slot. During the Friday tutorial slot, we will have a small quiz every week (mandatory attendance) and discuss the material and exercises. For questions about the material or exercises, join the office hours or participate in the online offerings on Zoom.
Description
Reinforcement learning is a powerful paradigm for modeling autonomous and intelligent agents interacting with the environment, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare. This course provides an introduction to reinforcement learning intelligence, which focuses on the study and design of agents that interact with a complex, uncertain world to achieve a goal. We will study agents that can make near-optimal decisions in a timely manner with incomplete information and limited computational resources. The course will cover Markov decision processes, reinforcement learning, planning, and function approximation (online supervised learning). The course will take an information-processing approach to the concept of mind and briefly touch on perspectives from psychology, neuroscience, and philosophy.
Learning objectives
At the end of this course, you will have gained both knowledge and system building abilities in:
- Define the key features of reinforcement learning that distinguishes it from AI and non-interactive machine learning (as assessed by the exam).
- Given an application problem (e.g. from computer vision, robotics, etc), decide if it should be formulated as a RL problem; if yes be able to define it formally (in terms of the state space, action space, dynamics and reward model), state what algorithm (from class) is best suited for addressing it and justify your answer (as assessed by the project and the exam).
- Implement in code common RL algorithms (as assessed by the homeworks).
- Describe (list and define) multiple criteria for analyzing RL algorithms and evaluate algorithms on these metrics: e.g. regret, sample complexity, computational complexity, empirical performance, convergence, etc (as assessed by homeworks and the exam).
- Describe the exploration vs exploitation challenge and compare and contrast at least two approaches for addressing this challenge (in terms of performance, scalability, complexity of implementation, and theoretical guarantees) (as assessed by an assignment and the exam).
List of Topics covered in this course (expected)
With a focus on AI as the design of agents learning from experience to predict and control their environment, topics will include
- Markov decision processes
- Planning by approximate dynamic programming
- Monte Carlo and Temporal Difference Learning for prediction
- Monte Carlo, Sarsa, and Q-learning for control
- Dyna and planning with a learned model
- Prediction and control with function approximation
- Policy gradient methods
Homework
- Homework 1: due 2021-10-21
- Homework 2: due 2021-11-04 (postponed to Nov 8th!)
- Homework 3: due 2021-11-17
- Homework 4: due 2021-12-03
- Project requirements sheet
- Project proposal template
- Project abstract deadline: on Quercus
- Project final deadline: tbd
- Project presentation: 2021-12-07
- Exam: tbd
- Intro to ML (CSC 311) or Intro to AI () or equivalent.
- CSC 209, MAT223, MAT232 and STA256
- CSC375, MAT224 and related
- A great introductory text on reinforcement learning: Sutton and Barto, Reinforcement Learning
- If you need a refresher on common mathematical tools and tricks for ML: Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, Mathematics for Machine Learning
- If you want to learn more about convex optimization, I recommend: Stephen Boyd's EE364a: Convex Optimization I and Stephen Boyd's EE364b: Convex Optimization II. Both of them have all course materials, including lecture videos, available online.
- RL Course from UW - Byron Boots (https://homes.cs.washington.edu/~bboots/RL-Fall2020/)
- RL Course from Stanford - Emma Brunskill (http://web.stanford.edu/class/cs234/index.html)
- RL Course from University of Alberta - Martha White (https://marthawhite.github.io/rlcourse/schedule.html)
- RL course at ASU/MIT, Dimitry Bertsekas - (http://web.mit.edu/dimitrib/www/RLbook.html )
- David Silver's course on Reinforcement Learning (http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html)
- Deep RL Course from Berkeley - Sergey Levine (http://rail.eecs.berkeley.edu/deeprlcourse/resources/#courses)
Project
Exam
Prerequisites
Priority will be given to students who meet prerequisites for the course. Knowledge of probability, multivariate calculus, and linear algebra is expected.
Required:
Recommended:
Algorithm implementation will be done mainly in Python. Please familiarize yourself with the language and common tools (git, cmd, and the frameworks numpy and pytorch).
Textbook & Resources
There is no required textbook.The course will provide all material in class and handouts.
The students can refer the following material for additional help:
Additional resources:
Reinforcement learning ressources:
Evaluation format
This course combines lectures with Tutorials, encouraging both fundamental knowledge acquisition as well as hands-on experience. Each student will be responsible for 4 individual assignments (40%), one take-home midterm (20%) and one project (20%). In addition, we will conduct 8 short online quizzes during the exercise slot on Fridays, the 4 best ones will count (20%). The quiz dates will be announced at least one week prior. FOr more information, see the syllabus.
Late penalties
Each student will have 3 grace days throughout the semester for late assignment submissions. Late submissions that exceed those grace days will lose 33% of their value for every late day beyond the allotted grace days. Late submissions that exceed three days of delay after the grace days have been used will unfortunately not be accepted. The official policy of the Registrar’s Office at UTM regarding missed exams can be found here.