RL Handbook

A comprehensive guide to Reinforcement Learning

Abstract

This handbook gives a comprehensive, up-to-date guide to reinforcement learning and sequential decision making. Starting from bandits and Markov decision processes, it progresses through value-based methods, policy gradients, actor-critic architectures, and model-based approaches. Advanced topics include imitation learning, offline RL, curiosity-driven exploration, and multi-agent systems. The material balances mathematical rigor with runnable code examples, and is designed to serve as an open, continuously updated resource for students, researchers, and engineers entering or working in the field.

Citation

@book{rlhandbook2026,
  author = {Ruslan Ageev},
  title = {RL Handbook: A Comprehensive Guide to Reinforcement Learning},
  year = {2026},
  publisher = {Online},
  url = {https://rl-handbook.com}
}

Changelog

2026-03-25

Site launch.

2026-04-02

Restructured into 6 parts: Introduction, Value-Based, Policy Gradient, Actor-Critic, Model-Based, Advanced Topics.

View full changelog →

Author

Ruslan Ageev

Ruslan Ageev

Researcher @ Tsinghua University · Reinforcement Learning & ML

Acknowledgements

We thank all contributors who helped improve this handbook through feedback, corrections, and new material.