Skip to content

What is Reinforcement Learning?

Taxonomy of RL Methods

Multi-Armed Bandits

Markov Decision Processes

Dynamic Programming

Monte Carlo and Temporal-Difference Prediction

Sarsa and Q-Learning

Deep Q-Networks

DQN Improvements

Policy Gradient and REINFORCE

Actor-Critic, A2C and A3C

Off-Policy Policy-Based Framework

Dyna and Learned Models

Model Predictive Control

AlphaZero and MuZero

Imitation Learning

RLHF and Language Models

Goal-Conditioned RL

Model-BasedDyna and Learned Models

Dyna and Learned Models

Integrated learning, planning, and acting with learned environment dynamics.

To be done soon

TD3 and SAC

TD3 stabilizes DDPG with twin critics, delayed actor updates, and target smoothing; SAC uses maximum-entropy stochastic control.

Model Predictive Control

MPC, MBPO, and planning with learned models for sample-efficient RL.