I've tried the continuous_A3C.py
, there exists some problems.
Problem
1. Incorrect dictionary update
|
env_config = DEFAULT_MULTIENV_CONFIG |
|
config_update = update_scenarios_parameter( |
|
json.load(open("macad_agents/a3c/env_config.json"))) |
|
env_config.update(config_update) |
Use dict.update
to update a dictionary will override pre-exists keys in the dict, such as:
(d1,d2) = ({"Country":{}},{"Country":{}})
d1['Country']['area'] = 960
d2['Country']['population'] = 14
d1.update(d2) # The 'area' key will be replaced by 'population'
This will cause the "fixed_delta_seconds" key to be lost, as a result macad-gym can't initialize.
2. Serialization Problem
|
class Worker(mp.Process): |
|
def __init__(self, gnet, opt, global_ep, global_ep_r, res_queue, name): |
|
super(Worker, self).__init__() |
|
self.name = 'w%i' % name |
|
self.g_ep, self.g_ep_r, self.res_queue = (global_ep, global_ep_r, |
|
res_queue) |
|
self.gnet, self.opt = gnet, opt |
|
self.lnet = Net(N_S.spaces[vehicle_name], |
|
N_A.spaces[vehicle_name]) # local network |
|
self.env = MultiCarlaEnv(env_config) |
Putting environment inside Net is not a good idea, when mp.Process serialize this object, it will try to serialize the environment as well, resulting in "can't pickle pygame.Font object" error.
Training
Even though I fixed those problems, it still doesn't seem to work. The "Mean Reward Curve" doesn't tend to go up (nor the distance curve to go down), and I haven't achieved one success episode yet (3M steps, maybe it's not enough to draw a conclusion?).
I know that PPO and IMPALA are the recommended algorithms, but since A3C is available in the repo, I want to know if it actually works.