Value-Based MethodsQ-LearningOff-policy temporal difference control algorithm.Copy MarkdownOpenPlaceholder content for Q-Learning.Monte Carlo and Temporal DifferenceSample-based methods for estimating value functions without a model.DQNDeep Q-Networks with experience replay and target networks.