What is Reinforcement Learning?Taxonomy of RL Methods History of Reinforcement Learning

Multi-Armed Bandits MDP and Dynamic Programming Monte Carlo and Temporal Difference Q-Learning DQN DQN Improvements

Policy Gradient Theorem and REINFORCE TRPO PPO RL for Sequence Generation and RLHF

Actor-Critic Framework DDPG TD3 and SAC

Dyna and Learned Models Model Predictive Control AlphaZero and MuZero

Imitation Learning Offline RL Exploration Goal-Conditioned RL Multi-Agent RL

Value-Based Methods

Q-Learning

Off-policy temporal difference control algorithm.

Placeholder content for Q-Learning.

Monte Carlo and Temporal Difference

Sample-based methods for estimating value functions without a model.

DQN

Deep Q-Networks with experience replay and target networks.