Skip to content

What is Reinforcement Learning?

Taxonomy of RL Methods

Multi-Armed Bandits

Markov Decision Processes

Dynamic Programming

Monte Carlo and Temporal-Difference Prediction

Sarsa and Q-Learning

Deep Q-Networks

DQN Improvements

Policy Gradient and REINFORCE

Actor-Critic, A2C and A3C

Off-Policy Policy-Based Framework

Dyna and Learned Models

Model Predictive Control

AlphaZero and MuZero

Imitation Learning

RLHF and Language Models

Goal-Conditioned RL

Model-BasedAlphaZero and MuZero

AlphaZero and MuZero

Planning with Monte Carlo tree search and learned models.

To be done soon

Model Predictive Control

MPC, MBPO, and planning with learned models for sample-efficient RL.

Imitation Learning

Behavioral cloning, DAgger, and learning from demonstrations.