Good day. I'm trying zoo now on a custom environment, and I'm getting a couple of questions.
Thanks a lot in advance.
(py3_6_12) andres@andres-mint:~/Documents/pyt_projs/git_clones/rl-baselines3-zoo$ python train.py --algo td3 --env BalanceBallPlus-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 4 --sampler tpe --pruner median
========== BalanceBallPlus-v0 ==========
Seed: 688703236
OrderedDict([('batch_size', 100),
('buffer_size', 1000000),
('env_wrapper', 'sb3_contrib.common.wrappers.TimeFeatureWrapper'),
('gamma', 0.99),
('gradient_steps', 1000),
('learning_rate', 0.001),
('learning_starts', 10000),
('n_timesteps', 1000000.0),
('noise_std', 0.1),
('noise_type', 'normal'),
('policy', 'MlpPolicy'),
('policy_kwargs', 'dict(net_arch=[400, 300])'),
('train_freq', 1000)])
Using 1 environments
Overwriting n_timesteps with n=50000
pybullet build time: Nov 26 2020 23:07:47
/home/andres/anaconda3/envs/py3_6_12/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
Applying normal noise with std 0.1
Optimizing hyperparameters
Sampler: tpe - Pruner: median
[I 2020-12-26 06:44:05,953] A new study created in memory with name: no-name-ce67f306-96d9-4778-85a6-f7f50e6461a5
[I 2020-12-26 07:09:10,061] Trial 3 finished with value: -2.966192 and parameters: {'gamma': 0.995, 'lr': 2.3763065873412468e-05, 'batch_size': 128, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 1000, 'noise_type': None, 'noise_std': 0.17512030997245787, 'net_arch': 'small'}. Best is trial 3 with value: -2.966192.
[I 2020-12-26 07:22:59,041] Trial 2 finished with value: -167.043263 and parameters: {'gamma': 0.999, 'lr': 0.0014386451147686773, 'batch_size': 32, 'buffer_size': 1000000, 'episodic': True, 'noise_type': None, 'noise_std': 0.5005792278567412, 'net_arch': 'medium'}. Best is trial 3 with value: -2.966192.
[I 2020-12-26 07:40:32,356] Trial 0 finished with value: -53.806006000000004 and parameters: {'gamma': 0.98, 'lr': 9.035743858880986e-05, 'batch_size': 256, 'buffer_size': 100000, 'episodic': True, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.9237972716285621, 'net_arch': 'big'}. Best is trial 3 with value: -2.966192.
[I 2020-12-26 07:43:16,663] Trial 5 finished with value: -107.140596 and parameters: {'gamma': 0.9, 'lr': 0.0003670649552870335, 'batch_size': 64, 'buffer_size': 1000000, 'episodic': True, 'noise_type': 'normal', 'noise_std': 0.10014564939749304, 'net_arch': 'small'}. Best is trial 3 with value: -2.966192.
[I 2020-12-26 08:05:35,558] Trial 7 finished with value: -46.69298 and parameters: {'gamma': 0.9, 'lr': 0.006234401879219351, 'batch_size': 256, 'buffer_size': 10000, 'episodic': True, 'noise_type': None, 'noise_std': 0.8369587884996299, 'net_arch': 'small'}. Best is trial 3 with value: -2.966192.
[I 2020-12-26 08:24:38,394] Trial 6 finished with value: -167.043263 and parameters: {'gamma': 0.95, 'lr': 0.32797048731022754, 'batch_size': 128, 'buffer_size': 10000, 'episodic': True, 'noise_type': 'normal', 'noise_std': 0.6534519680711605, 'net_arch': 'medium'}. Best is trial 3 with value: -2.966192.
[I 2020-12-26 08:57:49,229] Trial 1 finished with value: -167.043263 and parameters: {'gamma': 0.99, 'lr': 0.016250740365838234, 'batch_size': 2048, 'buffer_size': 10000, 'episodic': True, 'noise_type': 'normal', 'noise_std': 0.8837561835079127, 'net_arch': 'medium'}. Best is trial 3 with value: -2.966192.
[I 2020-12-26 09:02:04,719] Trial 9 finished with value: 149.950291 and parameters: {'gamma': 0.98, 'lr': 0.09222338177011687, 'batch_size': 16, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 256, 'noise_type': 'normal', 'noise_std': 0.48004918808417185, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 09:18:46,665] Trial 10 finished with value: -86.020416 and parameters: {'gamma': 0.9999, 'lr': 0.010826699760461068, 'batch_size': 128, 'buffer_size': 100000, 'episodic': False, 'train_freq': 16, 'noise_type': None, 'noise_std': 0.9218371772935142, 'net_arch': 'small'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 09:49:57,960] Trial 11 finished with value: -27.108461 and parameters: {'gamma': 0.99, 'lr': 0.0002607336884492247, 'batch_size': 2048, 'buffer_size': 100000, 'episodic': True, 'noise_type': 'normal', 'noise_std': 0.4876766153300758, 'net_arch': 'small'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 10:10:09,944] Trial 4 finished with value: 149.950291 and parameters: {'gamma': 0.99, 'lr': 0.09596935214110522, 'batch_size': 2048, 'buffer_size': 10000, 'episodic': False, 'train_freq': 1000, 'noise_type': None, 'noise_std': 0.7610310201815246, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 10:18:13,678] Trial 12 finished with value: -179.039573 and parameters: {'gamma': 0.999, 'lr': 0.009408532288202605, 'batch_size': 256, 'buffer_size': 10000, 'episodic': True, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.6316383873470157, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 10:27:21,354] Trial 13 finished with value: -167.043263 and parameters: {'gamma': 0.98, 'lr': 0.4498888137624217, 'batch_size': 16, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 256, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.29382174552012386, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 10:35:46,187] Trial 16 pruned.
[I 2020-12-26 10:46:24,987] Trial 14 finished with value: -167.043263 and parameters: {'gamma': 0.98, 'lr': 0.48126050460164926, 'batch_size': 16, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 256, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.35394714914984504, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 10:55:16,323] Trial 8 finished with value: -167.043263 and parameters: {'gamma': 0.98, 'lr': 0.14056740011818536, 'batch_size': 2048, 'buffer_size': 1000000, 'episodic': True, 'noise_type': 'normal', 'noise_std': 0.24861868995949032, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 10:55:30,237] Trial 15 finished with value: -52.049041 and parameters: {'gamma': 0.98, 'lr': 0.34282425928675386, 'batch_size': 16, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 256, 'noise_type': 'normal', 'noise_std': 0.30874667494551744, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 11:18:48,623] Trial 17 finished with value: -52.62094 and parameters: {'gamma': 0.98, 'lr': 0.07741577403078599, 'batch_size': 100, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 1000, 'noise_type': 'normal', 'noise_std': 0.38034604318077103, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 11:29:24,542] Trial 18 finished with value: -179.039573 and parameters: {'gamma': 0.98, 'lr': 0.07000544909228827, 'batch_size': 100, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': 'normal', 'noise_std': 0.4917063481756351, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 12:42:58,148] Trial 21 finished with value: 149.950291 and parameters: {'gamma': 0.99, 'lr': 0.03485792895505385, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.7748378071423547, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 12:53:57,673] Trial 22 finished with value: -167.043263 and parameters: {'gamma': 0.9999, 'lr': 0.03017840665601139, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 128, 'noise_type': None, 'noise_std': 0.7895056964816064, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 12:58:05,962] Trial 19 finished with value: -167.043263 and parameters: {'gamma': 0.99, 'lr': 0.04564261693344617, 'batch_size': 1024, 'buffer_size': 10000, 'episodic': False, 'train_freq': 128, 'noise_type': 'normal', 'noise_std': 0.465268061541669, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 12:58:27,350] Trial 20 finished with value: -52.049041 and parameters: {'gamma': 0.995, 'lr': 0.06252329780433087, 'batch_size': 1024, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.4942103437027389, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 13:32:28,457] Trial 26 pruned.
[I 2020-12-26 14:16:07,999] Trial 24 finished with value: -65.89648 and parameters: {'gamma': 0.99, 'lr': 0.001809149984011838, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.6014320878186472, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 14:20:37,143] Trial 25 finished with value: 149.950291 and parameters: {'gamma': 0.99, 'lr': 0.003360498924126242, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.7435862410168138, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 14:44:27,204] Trial 23 finished with value: -59.411483 and parameters: {'gamma': 0.995, 'lr': 0.0018973546110706463, 'batch_size': 1024, 'buffer_size': 10000, 'episodic': False, 'train_freq': 128, 'noise_type': None, 'noise_std': 0.5797212846678705, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 14:51:46,423] Trial 27 finished with value: 149.950291 and parameters: {'gamma': 0.95, 'lr': 0.9081390217934642, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.9939435496543059, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 15:17:35,901] Trial 29 finished with value: -52.049041 and parameters: {'gamma': 0.99, 'lr': 0.02087475325558366, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.9858779085972529, 'net_arch': 'medium'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 15:29:38,490] Trial 28 finished with value: -179.039573 and parameters: {'gamma': 0.95, 'lr': 0.20026135651189286, 'batch_size': 512, 'buffer_size': 100000, 'episodic': False, 'train_freq': 1000, 'noise_type': None, 'noise_std': 0.5828407402028809, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 15:40:23,231] Trial 30 finished with value: -179.039573 and parameters: {'gamma': 0.99, 'lr': 0.005146795122267626, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.9922450882442746, 'net_arch': 'medium'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 15:47:47,567] Trial 31 finished with value: -167.043263 and parameters: {'gamma': 0.95, 'lr': 0.004842010015157103, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.9707259160241756, 'net_arch': 'medium'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 16:32:43,979] Trial 32 finished with value: -167.043263 and parameters: {'gamma': 0.99, 'lr': 0.005043618255920021, 'batch_size': 512, 'buffer_size': 100000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.6993714984181254, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 16:43:11,417] Trial 33 finished with value: -81.090492 and parameters: {'gamma': 0.95, 'lr': 1.142953243401593e-05, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.9954450975421397, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 16:53:18,618] Trial 34 finished with value: -96.260865 and parameters: {'gamma': 0.95, 'lr': 0.0006517069341591469, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.834699645830437, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 17:01:20,391] Trial 35 finished with value: 149.950291 and parameters: {'gamma': 0.95, 'lr': 0.87227910330435, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.8533854521550969, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 17:14:28,346] Trial 36 finished with value: 17.055749 and parameters: {'gamma': 0.95, 'lr': 0.0007090520764364181, 'batch_size': 32, 'buffer_size': 10000, 'episodic': False, 'train_freq': 1, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.8750327667779954, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 17:25:24,484] Trial 37 finished with value: -77.045215 and parameters: {'gamma': 0.95, 'lr': 0.8531124566426208, 'batch_size': 32, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 1, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.8677264705868984, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 17:27:24,652] Trial 38 finished with value: -52.049041 and parameters: {'gamma': 0.99, 'lr': 0.02620810746577697, 'batch_size': 64, 'buffer_size': 10000, 'episodic': False, 'train_freq': 16, 'noise_type': None, 'noise_std': 0.8928606266895032, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 17:34:03,031] Trial 42 pruned.
[I 2020-12-26 17:42:51,929] Trial 39 finished with value: 149.950291 and parameters: {'gamma': 0.95, 'lr': 0.9494030805368031, 'batch_size': 32, 'buffer_size': 10000, 'episodic': False, 'train_freq': 1, 'noise_type': None, 'noise_std': 0.8809092713572011, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 17:49:35,371] Trial 40 finished with value: -52.049041 and parameters: {'gamma': 0.99, 'lr': 0.9189626266960231, 'batch_size': 64, 'buffer_size': 10000, 'episodic': False, 'train_freq': 16, 'noise_type': None, 'noise_std': 0.7858857639710304, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 18:01:22,750] Trial 41 finished with value: -168.040922 and parameters: {'gamma': 0.999, 'lr': 0.9488439713863275, 'batch_size': 64, 'buffer_size': 10000, 'episodic': False, 'train_freq': 1000, 'noise_type': None, 'noise_std': 0.9371933101519725, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 18:12:18,427] Trial 43 finished with value: -179.039573 and parameters: {'gamma': 0.999, 'lr': 0.22322268887467173, 'batch_size': 64, 'buffer_size': 10000, 'episodic': False, 'train_freq': 1000, 'noise_type': None, 'noise_std': 0.8049306042778772, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 18:27:16,658] Trial 45 finished with value: -167.043263 and parameters: {'gamma': 0.999, 'lr': 0.2068361241733535, 'batch_size': 32, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 256, 'noise_type': 'normal', 'noise_std': 0.007584062943803038, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 19:21:14,016] Trial 46 finished with value: -167.043263 and parameters: {'gamma': 0.9999, 'lr': 0.2290603740718307, 'batch_size': 512, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 2000, 'noise_type': 'normal', 'noise_std': 0.4188127541502431, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 19:32:54,938] Trial 47 finished with value: -179.039573 and parameters: {'gamma': 0.9999, 'lr': 0.014488144853174149, 'batch_size': 512, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 256, 'noise_type': 'normal', 'noise_std': 0.7270447502646388, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.