xuanlinli17 / cs285_fa19_deep_reinforcement_learning Goto Github PK

View Code? Open in Web Editor NEW

116.0 116.0 37.0 52.33 MB

My solutions to UC Berkeley CS285 (originally CS294-112, deeprlcourse) Fall 2019 assignments

Home Page: https://github.com/xuanlinli17/CS285_Fa19_Deep_Reinforcement_Learning

Python 99.66% Jupyter Notebook 0.02% Shell 0.33%

cs285_fa19_deep_reinforcement_learning's People

Contributors

Stargazers

Watchers

cs285_fa19_deep_reinforcement_learning's Issues

hw2: general advantage estimation

@xuanlinli17

Can share where did you get below formula from

mb_advs[t] = delta + self.gamma * self.lam * mb_advs[t+1] ?

CS285_Fa19_Deep_Reinforcement_Learning/hw2/cs285/agents/pg_agent.py

Line 86 in ba2e8e1

mb_advs[t] = delta + self.gamma * self.lam * mb_advs[t+1]

why retreive the first element of action

CS285_Fa19_Deep_Reinforcement_Learning/hw1/cs285/infrastructure/utils.py

Line 32 in ba2e8e1

ac = ac[0]

May i know why, this statement only return the first element, instead of using argmax to choose the best action, output from tf.multinomial?

Reproducing the result of hw1 problem 1(b)

Hi there! I am trying to reproduce the result of homework 1, problem 1(b). I use the file requirements.txt to install all my dependencies. And when I ran the command:

python cs285/scripts/run_hw1_behavior_cloning.py --expert_policy_file cs285/policies/experts/HalfCheetah.pkl --env_name HalfCheetah-v2 --exp_name test_bc_hcheetah --n_iter 1 --expert_data cs285/expert_data/expert_data_HalfCheetah-v2.pkl --batch_size=1000 --eval_batch_size=5000

what I got:

Loading expert policy from... cs285/policies/experts/HalfCheetah.pkl
obs (1, 17) (1, 17)
Done restoring expert policy...


********** Iteration 0 ************

Training agent using sampled data from replay buffer...

Beginning logging procedure...

Collecting data for eval...
Eval_AverageReturn : 4.991946220397949
Eval_StdReturn : 17.147544860839844
Eval_MaxReturn : 32.29301452636719
Eval_MinReturn : -9.376068115234375
Eval_AverageEpLen : 1000.0
Train_AverageReturn : 4205.7783203125
Train_StdReturn : 83.038818359375
Train_MaxReturn : 4288.81689453125
Train_MinReturn : 4122.7392578125
Train_AverageEpLen : 1000.0
Train_EnvstepsSoFar : 0
TimeSinceStart : 4.198240041732788
Initial_DataCollection_AverageReturn : 4205.7783203125
Done logging...



Saving agent's actor...

So the average return of evaluation is about 4.99, which does not match the result provided in folder ./hw1/run_logs/bc_test_bc_hcheetah_HalfCheetah-v2_16-09-2019_00-58-58/. I was wondering which part I've done wrong and it would be nice if you could help me figure it out. Many thanks!

hw4

CS285_Fa19_Deep_Reinforcement_Learning/hw4/cs285/infrastructure/utils.py

Line 22 in ba2e8e1

    
           ob = model.get_prediction(ob, ac, data_statistics) # TODO(Q1) Get predicted next state using the model

ob = model.get_prediction ; ob result is not appended into pred_states.append(ob) . is this correct?

xuanlinli17 / cs285_fa19_deep_reinforcement_learning Goto Github PK

cs285_fa19_deep_reinforcement_learning's People

Contributors

Stargazers

Watchers

Forkers

cs285_fa19_deep_reinforcement_learning's Issues

hw2: general advantage estimation

why retreive the first element of action

Reproducing the result of hw1 problem 1(b)

hw4

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent