RL Handbook
Policy Gradient

RL for Sequence Generation and RLHF

Applying policy gradient to sequence models, reward modeling, and alignment.

Placeholder content for RL for Sequence Generation and RLHF.