Abstract
This handbook gives a comprehensive, up-to-date guide to reinforcement learning and sequential decision making. Starting from bandits and Markov decision processes, it progresses through value-based methods, policy gradients, actor-critic architectures, and model-based approaches. Advanced topics include imitation learning, offline RL, curiosity-driven exploration, and multi-agent systems. The material balances mathematical rigor with runnable code examples, and is designed to serve as an open, continuously updated resource for students, researchers, and engineers entering or working in the field.
Citation
@book{rlhandbook2026,
author = {Ruslan Ageev},
title = {RL Handbook: A Comprehensive Guide to Reinforcement Learning},
year = {2026},
publisher = {Online},
url = {https://rl-handbook.com}
}Changelog
2026-03-25
Site launch.
2026-04-02
Restructured into 6 parts: Introduction, Value-Based, Policy Gradient, Actor-Critic, Model-Based, Advanced Topics.
Acknowledgements
We thank all contributors who helped improve this handbook through feedback, corrections, and new material.
