RL Handbook

A comprehensive guide to Reinforcement Learning

Abstract

This handbook gives a comprehensive, up-to-date guide to reinforcement learning and sequential decision making. Starting from bandits and Markov decision processes, it progresses through value-based methods, policy gradients, actor-critic architectures, and model-based approaches. Advanced topics include imitation learning, offline RL, curiosity-driven exploration, and multi-agent systems. The material balances mathematical rigor with runnable code examples, and is designed to serve as an open, continuously updated resource for students, researchers, and engineers entering or working in the field

Chapter Contents

01
Introduction
1. 01.1Introduction
2. 01.2What is Reinforcement Learning?
3. 01.3Taxonomy of RL Methods
02
Value-Based
1. 02.1Multi-Armed Bandits
2. 02.2Markov Decision Processes
3. 02.3Dynamic Programming
4. 02.4Monte Carlo and Temporal-Difference Prediction
5. 02.5Sarsa and Q-Learning
6. 02.6Deep Q-Networks
7. 02.7DQN Improvements
03
On-Policy Policy-Based
1. 03.1Policy Gradient and REINFORCE
2. 03.2Actor-Critic, A2C and A3C
3. 03.3TRPO
4. 03.4PPO
04
Off-Policy Policy-Based
1. 04.1Off-Policy Policy-Based Framework
2. 04.2DDPG
3. 04.3TD3 and SAC
05
Model-Based
Coming soon
06
Advanced Topics
Coming soon

Author

Ruslan Ageev

RL research @ Tsinghua University | ML & AI

GitHub LinkedIn Email Telegram

Acknowledgements

We thank all contributors who helped improve this handbook through feedback, corrections, and new material

RL Handbook

Abstract

Chapter Contents

Introduction

Value-Based

On-Policy Policy-Based

Off-Policy Policy-Based

Model-Based

Advanced Topics

Author

Acknowledgements