narsil / alphagozero Goto Github PK

View Code? Open in Web Editor NEW

57.0 10.0 16.0 114 KB

Unofficial attempt to rebuild AlphaGo Zero

License: MIT License

Python 99.83% Shell 0.17%

alphago-zero tensorflow keras

alphagozero's People

Contributors

Stargazers

Watchers

Forkers

baimin1 awesome-archive hyperchi guochangjiang ceokereke yangkf1985 1715509415 ostoe sparkg shubhampachori12110095 utgarda marc45 307509256 shuxjweb feihongyu daoqiclub

alphagozero's Issues

Generate more training data

I'm interested in how to generate more training data from existing .sgf file, especially for the policy_target and value_target attribute. Is there some code already implemented this feature?

self-play: 155 parameters not match with the called function

It ran for a couple of days and found several new best models. However, it also creates numerous files (502,586 items, totalling 5.6 GB). The models directory is large and the games directory has most of the files. Perhaps zipping would be worthwhile. In any case, I'm happy to restart it again after you have had a chance to make more improvements. Thanks again for sharing.

Best model

How can I play with best model?

gtp.py need a Komi function to work with sabaki

model.py should add a line

from functools import reduce #by lian

legal-moves in play.py might be better if sucide check added

No Model Progress

Did some more runs with SHOW_END_GAME True.
No model progress at all does not seem right for 63 models.
Before OOM fix (#3), was finding better models (until hitting OOM).

Have a log which I could email you, unless you would rather I post here.
I think my email is public.
Thanks

Sampled evaluation games

In the original paper, only positions from self-play games are sampled. These have temperature=1 for part of the games, meaning more exploration. Won't adding all evaluation games to the games to sample from heavily decrease exploration? Of course we could remove the recording of evaluation games if we parallelize everything, but since it saves time doing this, I was wondering if you know if it has any noticeable negative impact.

legal-moves in play.py might be better if sucide check added

Set size changed during iteration -- is this a problem

Ubuntu 16.04 LTS
Thanks,
BrianR (author of Tinker chess engine)

brianr@Tinker-Ubuntu:~/alphagozero$ python3 main.py
Using TensorFlow backend.
2017-11-06 16:53:14.674061: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-11-06 16:53:14.674512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 770 major: 3 minor: 0 memoryClockRate(GHz): 1.163
pciBusID: 0000:01:00.0
totalMemory: 1.94GiB freeMemory: 1.67GiB
2017-11-06 16:53:14.674554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 770, pci bus id: 0000:01:00.0, compute capability: 3.0)
Exception in thread Thread-2: | 1/20 [00:51<16:09, 51.02s/it]
Traceback (most recent call last):██▎ | 34/162 [00:49<03:06, 1.46s/it]
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner | 6/162 [00:08<03:36, 1.39s/it]
self.run()
File "/home/brianr/.local/lib/python3.5/site-packages/tqdm/_tqdm.py", line 144, in run
for instance in self.tqdm_cls._instances:
File "/usr/lib/python3.5/_weakrefset.py", line 60, in iter
for itemref in self.data:
RuntimeError: Set changed size during iteration

BTW it is running and says:
Evaluation model_2 vs model_1 (winrate:100%): 100%|██████████████████████████████████████████████████████| 10/10 [19:11<00:00, 115.13s/it]
We found a new best model : model_2!███████████████████████▉ | 78/162 [02:07<02:16, 1.63s/it]

Potential problems

Edit: turned it into a general thread instead

The AGZ spreadsheet mentions only one filter for the value head. In this implementation, two filters are used. Any reason to it? I don't think it's going to have a big impact, but I'm just putting it out there.
The target policies that are created during simulated games are taken from the prior probabilities p. These are calculated by the neural net. From the AGZ cheatsheet I believe that the target policies should instead be the search probabilities, which are given by the number of visits of a move and the temperature parameter.

Some notes:

During MCTS search, there are lots of zero Q-values and often patches of Q-values that are almost 1 appear. (This might just be due to a bad network)
The MCTS batched search yields more Q-values, but the search depth will be considerably lowered. Chosen moves are only at max depth 4 from the current position and usually 2 or 3. Running 64 simulations with batch size 1 can give chosen moves with up to depth 66 from the current position, but of course, it will be slower. Unsure on what is a good balance. Hard to tune.

index2coord(index) in self_play.py should be as follows?

def index2coord(index):
y = np.int(index / SIZE)
x = np.int(index - SIZE * y)
return x, y

Easy way to import existing sgf files?

Is there an easy way to train the network on a large set of professional-level sgf files? I have a database of several tens of thousands of games I'd like to use to set initial weights. But I'm not sure how to put this into the format AGZ needs.

Thanks!