Skip to content

What is Reinforcement Learning?

Taxonomy of RL Methods

Multi-Armed Bandits

Markov Decision Processes

Dynamic Programming

Monte Carlo and Temporal-Difference Prediction

Sarsa and Q-Learning

Deep Q-Networks

DQN Improvements

Policy Gradient and REINFORCE

Actor-Critic, A2C and A3C

Off-Policy Policy-Based Framework

Dyna and Learned Models

Model Predictive Control

AlphaZero and MuZero

Imitation Learning

RLHF and Language Models

Goal-Conditioned RL

Advanced TopicsImitation Learning

Imitation Learning

Behavioral cloning, DAgger, and learning from demonstrations.

To be done soon

AlphaZero and MuZero

Planning with Monte Carlo tree search and learned models.

Offline RL

Learning from fixed datasets without environment interaction.