week13_rl

Materials (based on `practical_rl` course)

Slides
Video lecture by D. Silver - https://www.youtube.com/watch?v=KHZVXao4qXs
Our lecture, seminar
Alternative lecture by J. Schulman part 1 - https://www.youtube.com/watch?v=BB-BhTn6DCM
Alternative lecture by J. Schulman part 2 - https://www.youtube.com/watch?v=Wnl-Qh2UHGg

Practice

Part 0 (not graded) - intro to gym(nasium) interface -

part 1 (5 points) - implement REINFORCE with a neural network agent -

part 2 (5-10 points) - optional advanced homework: implement either A2C OR PPO.

A2C aka Advantage Actor Critic (5 points) a2c-optional.ipynb.
PPO aka Proximal Policy Optimization (10 points) ppo.ipynb

If you chose to do PPO, you don't need to submit A2C and it will award no extra points since PPO expands A2C. So either do (reinforce -> a2c) for up to 10 points OR (reinforce -> ppo) for up to 15 points.

If you choose PPO, we recommend additional materials; pick one of:

Text materials (english): https://spinningup.openai.com/en/latest/algorithms/ppo.html (english)
Our videos (russian): lecture, seminar(PyTorch)

More materials

A full-term course on reinforcement learning - practical_rl
Actually proving the policy gradient for discounted rewards - article
On variance of policy gradient and optimal baselines: article, another article
Generalized Advantage Estimation - a way you can speed up training for homework_*.ipynb - article
Generalizing log-derivative trick - url
Combining policy gradient and q-learning - arxiv
Bayesian perspective on why reparameterization & logderivative tricks matter (Vetrov's take) - pdf
Adversarial review of policy gradient - blog

Name		Name	Last commit message	Last commit date
parent directory ..
test_ppo		test_ppo
README.md		README.md
a2c-optional.ipynb		a2c-optional.ipynb
atari_wrappers.py		atari_wrappers.py
env_batch.py		env_batch.py
intro.ipynb		intro.ipynb
ppo.ipynb		ppo.ipynb
reinforce_pytorch.ipynb		reinforce_pytorch.ipynb
reinforce_tensorflow.ipynb		reinforce_tensorflow.ipynb
runners.py		runners.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Materials (based on `practical_rl` course)

Practice

More materials

FilesExpand file tree

week13_rl

Directory actions

More options

Directory actions

More options

Latest commit

History

week13_rl

Folders and files

parent directory

README.md

Materials (based on practical_rl course)

Practice

More materials

Materials (based on `practical_rl` course)