Students should be able to:
- Understand the difference between online and batch learning.
- Describe the main online learning algorithms and understand the analysis of their performance.
- Understand the multi-armed bandit problem, describe the main algorithms, and understand the analysis of their performance.
- Understand the goal of reinforcement learning and the mathematical MDP model.
- Describe the basic evaluation criteria for RL: finite, infinite, and discounted horizon.
- Describe the main algorithms for model-based RL and understand their performance guarantees.
- Describe the main algorithms for model-free RL and understand their performance guarantees.
- Understand value function approximation and deep RL.