upb-lea / reinforcement_learning_course_materials Goto Github PK

Lecture notes, tutorial tasks including solutions as well as online videos for the reinforcement learning course hosted by Paderborn University

License: MIT License

Jupyter Notebook 96.96% Python 0.42% MATLAB 0.46% TeX 2.17%

reinforcement-learning control teaching teaching-materials python latex prediction machine-learning course course-materials

reinforcement_learning_course_materials's Introduction

Reinforcement Learning Course Materials

Lecture notes, tutorial tasks including solutions as well as online videos for the reinforcement learning course hosted by Paderborn University. Source code for the entire course material is open and everyone is cordially invited to use it for self-learning (students) or to set up your own course (lecturers).

Lecture Content

Introduction to Reinforcement Learning
Markov Decision Processes
- Lecture video
- Lecture slides
Dynamic Programming
- Lecture video
- Lecture slides
Monte Carlo Methods
- Lecture video
- Lecture slides
Temporal-Difference Learning
- Lecture video
- Lecture slides
Multi-Step Bootstrapping
- Lecture video
- Lecture slides
Planning and Learning with Tabular Methods
- Lecture video
- Lecture slides
Function Approximation with Supervised Learning
- Lecture video
- Lecture slides
On-Policy Prediction with Function Approximation
- Lecture video
- Lecture slides
Value-Based Control with Function Approximation
- Lecture video
- Lecture slides
Stochastic Policy Gradient Methods
- Lecture video
- Lecture slides
Deterministic Policy Gradient Methods
- Lecture video
- Lecture slides
Further Contemporary RL Algorithms (TRPO, PPO)
- Lecture video
- Lecture slides
Outlook and Research Insights
- Lecture video
- Lecture slides

Summary of Part One: Reinforcement Learning in Finite State and Action Spaces
- Lecture slides
Summary of Part Two: Reinforcement Learning Using Function Approximation
- Lecture slides
Full course slides
- Lecture slides

Exercise Content

All exercises are based on Python 3.9 and site-packages according to the requirements.txt:

>>> pip install -r requirements.txt

Basics of Python for Scientific Computing
- Tutorial video (only 2022 edition available due to technical outage)
- Tutorial template
- Tutorial solution
Manually Solving Basic Markov Chain, Reward and Decision Problems
The Beer-Bachelor and Dynamic Programming (the Shortest Beer Problem)
- Tutorial video (only 2022 edition available due to technical outage)
- Tutorial template
- Tutorial solution
Drive Through the Race Track with Monte Carlo Learning
Drive even Faster Using Temporal-Difference Learning
Stabilizing the Inverted Pendulum by Tabular Multi-Step Methods
Boosting the Inverted Pendulum by Integrating Learning & Planning (Dyna Framework)
Predicting the Operating Behavior of a Real Electric Drive Systems with Supervised Learning
Evaluate the Performance of Given Agents in the Mountain Car Problem Using Function Approximation
Escape from the Mountain Car Valley Using Semi-Gradient Sarsa & Least Square Policy Iteration
Landing on the Moon with REINFORCE and Actor-Critic Methods
Shoot for the moon with DDPG & PPO

Contributions

We highly appreciate any feedback and input to the course material e.g.

typos or content-related discussions (please raise an issue)
adding new contents (please provide a pull request)

If you like to contribute to the repo to a larger extent, please do not hesitate to contact us directly.

Credits

The lecture notes are inspired by

The tutorials are partly using pre-packed environments from

Gymnasium (maintained branch of OpenAI's Gym)

Citation

See "Cite this repository" on top

reinforcement_learning_course_materials's People

Contributors

Stargazers

Watchers

Forkers

alaabejaoui mohammedeltoum longlevan roanhope leilin-research gerentt dmr07 smartcodes-ai husseinmleng gsundeep-tech flameinbrain speedhunter001 ruslanraupoff gachet tristanoprofetto allensmile ganesh3 rafaelmri tienhoangvan sbhmajum369 pashanitw mfosset mikediaz93 creative-research-project-v1-1 laplacekorea ashish7406 hamedminaeizaeim pragyanaischool devendratrivedi jm-rishav dliofindia jlbaroja josuema computer-vision-machine-learning-mark djoguns mertclk yuh2017 karimahmed93 hegzo-bit magdyedwar1996 duydo77 tcruz2 prass hirajanwin sianlun cmeninwa hadryan dcarretero ibantxodrumz xsteez alexrogalskiy madquirk-hash garfield74 phoitack a-raafat lamyaa-zayed mahmoud-saadel-din shehab-mahmoud seanahmad lepennec kai-zheng alperbek jx1211 adebayomadebayo ibkvictor nlebang arusri23 abhinav-sharma-6167 aditya-zutshi mbrukman terragord7 alejandroe-sanchezv swapnasourav sou-786 zemarchezi muhammed189 bochrachemam vladiluzjr aditya964 peace-bakare emykes numericx cohya prashantrivedi yashguleria sharanharsoor guama1239 rogerlop leticia-maria hrocha whoismanoj siddharth1india chsjiang vineetp6 osumo23 techthiyanes mblukac ogunjosam nijinjose adilsheraz

reinforcement_learning_course_materials's Issues

Grammatical Errors

Lecture 08
Slide 10 : Last point - It should be ..." an ML model"...

Lecture 11
Slide 2: ..."Goal of today's lecture"... [missing apostrophe]

A mistake in lecture 1, slide 46

lecture 1, slide 46: in the summation in 1.16 r_{k+i} should be a function of u_{k+i-1} because
for example we have r_{k+1} when u_{k} is applied.

A typo in Lecture 2, slide 13

"Rewards R_k only dependent on state X_k" should be "Rewards R_{k+1} only dependent on state X_k"

Eligibility traces for SARSA(lambda)

In Lecture 6 in the definition of the TD($\lambda$) update on Slide 30, the definition needs to be adjusted. On the one hand it should be "SARSA($\lambda$)" , on the other hand the Eligibility trace must be adjusted by the action: $z_k(x_k, a_k)$.

Quellen:
Reinforcement Learning: An Introduction (Second Edition), Chapter 7.5 p. 183ff

Exercise 5: Change the example environment for tasks 3 & 4

Double Q learning was introduced as a way to remove maximization bias especially in stochastic environments. Now in exercise 5 we're given a stochastic environment on which double Q learning is behaving worse than normal Q learning. While this teaches us that in practice it's not always clear which method will be the best, it is not helpful to strengthen the concepts learned in lecture. A new learner will question his solution of this task and in general the benefit of double Q learning. So my suggestion is to find a better fit environment to see the advantages that double Q learning has over single Q learning.

Lecture 2, slide 28: inconsistent notation

in eqs (2.14) and (2.15) "u_k" must me "u " and "x_k" must be "x" to be consistent with the notation used in the rest of the equations.

Missed steps in going from Eq.3 to Eq.3.12

It will be helpful to put intermediate steps to show how Eq.3.12 is obtained from Eq.3.

Exercise 4: cannot find Racetrack Environment

Describe the bug
In Exercise 04 Monte-Carlo methods should be implemented for the racetrack environment, however, I cannot find where racetrack_environment.py is located. Could you provide installation instructions or an description of the environment (e.g. env.yml for conda) to be used?

Grammatical Error

Lecture 1, slide 36: I think it should be 'an MDP' instead of 'a MDP'.
e.g. 1: "An MDP defines a stochastic control problem"
(slide 8 on https://www.ccs.neu.edu/home/rplatt/cs5335_fall2017/slides/mdps.pdf )
e.g. 2: "In Mathematics, a Markov Decision Process ..."
(https://en.wikipedia.org/wiki/Markov_decision_process)

Dependency Issues for Running the Excercises

Finding and installing the correct package versions to run the later exercises is too cumbersome.
It seems that some packages have to be downgraded concerning the provided requirements.txt.

This should be streamlined if possible.

As reference, this is the environment that resulted for me to exclusively run ex12:
ex12_schenke_requirements.txt

Lecture2 figure caption is cut

In Lecture02 slide 27, the figure caption is cut off by the bottom screen border.

Tidy up tutorial solution notebooks and provide problem templates

Some of the solution notebooks are not well readable. It seems that they partly contain debug column of figures or a bulk of number arrays floating around. There is still some room for improvement here to promote readability and a quick introduction to the topic.
It would be nice to have a small mark down information sheet per tutorial summarizing the most important information (very short description of the adressed problems, used algorithms, maybe an overview of the sub-tasks within the notebook). The format of the mark down sheet should be standardized among all tutorials.
And finally, please provide the task templates with gaps for the student inputs. Hence, for every notebook there should be two files like ex00_task_template.ipynb and ex00_solution.ipynb.
- Question/remark: Is there maybe a straightforward way to automate the generation of the task template notebooks based on the solution notebooks (e.g. simplified / light nbgrader with Travis backend which erases parts of the solution code based on built-in keywords). Investing once the effort in order to automate this pipeline will be very convinient for future updates of the tasks.

Lecture 2, slide 40: rewording is needed

"An optimal policy must equal the expected return for the best action
of a given state:"

A policy cannot equal to the expected return since they are different things. Re-wording is needed for this sentence.

Insufficient explanation Lecture 2, slide 8

In Lecture 2, slide 8, it is not explained how P_{xx'}^m in eq.(2.3) can be replaced by a constant transition matrix as m goes to infinity. It is clear that as m goes to infinity p_{k+m} and p_{k} in eq.(2.3) can be replaced by a constant matrix p, but not clear and not explained how P_{xx'}^m becomes P_{xx'}.

Representation error in the task sheet 3 (template and solution)

Some formulas are not correctly depicted at the start of both the template and solution notebook, cf.: