Policy GradientTRPOTrust Region Policy Optimization with KL divergence constraint.Copy MarkdownOpenPlaceholder content for TRPO.Policy Gradient Theorem and REINFORCEDeriving the policy gradient and the REINFORCE Monte Carlo estimator.PPOProximal Policy Optimization with clipped surrogate objective.