Model-BasedAlphaZero and MuZeroAlphaZero and MuZeroPlanning with Monte Carlo tree search and learned models.Copy MarkdownFeedbackTo be done soonModel Predictive ControlMPC, MBPO, and planning with learned models for sample-efficient RL.RLHF and Language ModelsRLHF for autoregressive language models, from reward modeling and PPO to DPO and GRPO alternatives.