Skip to content

Latest commit

 

History

History
 
 

README.md

Materials (based on practical_rl course)

Practice

Part 0 (not graded) - intro to gym(nasium) interface - Open In Colab

part 1 (5 points) - implement REINFORCE with a neural network agent - Open In Colab

part 2 (5-10 points) - optional advanced homework: implement either A2C OR PPO.

If you chose to do PPO, you don't need to submit A2C and it will award no extra points since PPO expands A2C. So either do (reinforce -> a2c) for up to 10 points OR (reinforce -> ppo) for up to 15 points.

If you choose PPO, we recommend additional materials; pick one of:

More materials

  • A full-term course on reinforcement learning - practical_rl

  • Actually proving the policy gradient for discounted rewards - article

  • On variance of policy gradient and optimal baselines: article, another article

  • Generalized Advantage Estimation - a way you can speed up training for homework_*.ipynb - article

  • Generalizing log-derivative trick - url

  • Combining policy gradient and q-learning - arxiv

  • Bayesian perspective on why reparameterization & logderivative tricks matter (Vetrov's take) - pdf

  • Adversarial review of policy gradient - blog