RL Handbook
Policy Gradient

TRPO

Trust Region Policy Optimization with KL divergence constraint.

Placeholder content for TRPO.