Abstract
This handbook gives a comprehensive, up-to-date guide to reinforcement learning and sequential decision making. Starting from bandits and Markov decision processes, it progresses through value-based methods, policy gradients, actor-critic architectures, and model-based approaches. Advanced topics include imitation learning, offline RL, curiosity-driven exploration, and multi-agent systems. The material balances mathematical rigor with runnable code examples, and is designed to serve as an open, continuously updated resource for students, researchers, and engineers entering or working in the field
Chapter Contents
- 01
Introduction
- 02
Value-Based
- 03
On-Policy Policy-Based
- 03.1Policy Gradient and REINFORCE
- 03.2Actor-Critic, A2C and A3C
- 03.3TRPO
- 03.4PPO
- 04
Off-Policy Policy-Based
- 04.1Off-Policy Policy-Based Framework
- 04.2DDPG
- 04.3TD3 and SAC
- 05
Model-Based
Coming soon - 06
Advanced Topics
Coming soon
Acknowledgements
We thank all contributors who helped improve this handbook through feedback, corrections, and new material