Giter Site home page Giter Site logo

codenames's People

Contributors

dkirkby avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

codenames's Issues

Migrate to python3

The pre-processed corpus is in unicode after #8 (with utf8 encoding, by default), which means that clues are now given in unicode. The new engine.say() function now handles encoding of all game output, including clues, but we now have a mixture of byte and unicode literals in engine.py that would be much cleaner in python3. This issue is to perform and test the migration.

Profile AI execution speed

The AI is getting a bit slow, especially with the larger corpus. This issue is to run a profile and see if there are any obvious bottlenecks that would be easy to improve.

Expand corpus

The corpus is currently scraped from a list of wikipedia articles that is automatically derived from the word list, but the resulting coverage (number of times each word appears) is quite un-even. This issue is to split the work now done by build_corpus.py into two tasks:

  • Build a list of wikipedia page titles to scrape based on the word list.
  • Perform the downloads and save content to the corpus/ directory.

The motivation for this split is to allow the first step to be improved without needing to run the second step (which takes most of the time). An added benefit is that the second step could be easily parallelized since it is limited by network IO, not CPU.

The goal of the improvements to the first step is to:

  • Find more articles related to each word list topic.
  • Obtain a more uniform amount of text for each word (currently SCUBA_DIVER has the least).

Implement double-sided cards?

The random board is currently created treating all 400 words as independent. However, the physical game has 200 double-sided cards, which means there are 200 pairs of words that can never appear together. Is it important to reproduce this?

Implement a simple method of online play

This would primarily be to allow beta testers to interact with the AI, so we can work out the bugs and tune parameters. The simplest starting point would be to host a single game at a time on a service like twitter. Turns could be timed so that multiple people can guess each word, then the majority choice goes to the AI.

PEP8 Compliance

Although it is pedantic, I am in favor of a consistent style and PEP8 is the de facto standard. This is not urgent, but there are some line length and white space issues after #7.

Unexpected Error:: No JSON object could be decoded

When I performed the statement: ./fetch_corpus_text.py, there will always be these two error messages as follow and The operation will be interrupted,which can't be solved by running again.

Unexpected Error:: No JSON object could be decoded
Unexpected Error:: HTTPConnectionPool(host='en.wikipedia.org', port=80): Max retries exceeded with url...

If I continue with the next steps, there will be an error of
RuntimeError:you must first build vocabulary before training the model
during training, I think it may be caused by the above problem.

So, how can I solve this problem? It would be so great with your help.

Implement optional "unlimited" clue

When some words were not guessed in a previous round, the clue number provided with subsequent clues should generally be increased to allow old clues to be used to guess extra words. The simplest implementation would be to keep a count of un-guessed words N and increase the count for new clues by (N-1). Do we need anything more sophisticated?

Improve logic for avoiding certain clues

The current implementation of model.get_clue(...) is too simple and ad-hoc. Visible words are now divided into four groups:

  • words associated with the clue
  • all words for our team
  • words for the other team or neutral
  • assassin word(s)

A clue is only accepted if it is a better match for all the clue words than any of the words for the other team or neutral. It must also be better than the assassin word by some margin.

Possible improvements:

  • Add separate margins for neutral words, other team words, assassin word(s).
  • Determine margins automatically from the distribution of similarity values for the proposed clue over the whole vocabulary (instead of using a fixed ad-hoc value).

Prevent the engine from giving the same clue multiple times

It could happen especially if the first guess was wrong in the previous round, because then the words to guess are the same.
I propose we keep a list of previous clues and remove them from future choices.

 NIGHT        BOW          SATURN       HAM         <<<<<<<<<<<< 
 SCUBA_DIVER <<<<<<<<<<<<  RABBIT       TURKEY       CRICKET     
 ICE_CREAM    MICROSCOPE   MEXICO       BILL         KID         
 STREAM       LAP          HOLE         BOTTLE       ANTARCTICA  
 LINE         QUEEN        COMIC        ICE          BACK        
>>> your clue is: england 3
>>> enter your guess #1: queen
>>> Sorry!

(...snip the other team's round...)

<NIGHT       <BOW         -saturn      >HAM         <<<<<<<<<<<< 
<SCUBA_DIVER <<<<<<<<<<<< >RABBIT      -turkey      >CRICKET     
>ICE_CREAM   >MICROSCOPE  >MEXICO      <BILL        #KID         
<STREAM      -lap         -hole        -bottle      -antarctica  
>LINE        <<<<<<<<<<<< >>>>>>>>>>>> <ICE         >BACK        
Thinking...
0.819 HAM + CRICKET + MEXICO = england
0.697 CRICKET = first-class
0.672 MEXICO = mexican
0.661 HAM + CRICKET + LINE = paddington
...
 NIGHT        BOW          SATURN       HAM         <<<<<<<<<<<< 
 SCUBA_DIVER <<<<<<<<<<<<  RABBIT       TURKEY       CRICKET     
 ICE_CREAM    MICROSCOPE   MEXICO       BILL         KID         
 STREAM       LAP          HOLE         BOTTLE       ANTARCTICA  
 LINE        <<<<<<<<<<<< >>>>>>>>>>>>  ICE          BACK        
>>> your clue is: england 3
>>> enter your guess #1: 

evaluate.py trying to call WordEmbedding.get_clues() method that doesn't exist

$ ./evaluate.py -i word2vec.dat.1 --top-singles 30 --top-pairs 30 --save-plots 1 

Traceback (most recent call last):
  File "./evaluate.py", line 88, in <module>
    main()
  File "./evaluate.py", line 39, in main
    clues = embedding.get_clues((word), (word))
AttributeError: 'WordEmbedding' object has no attribute 'get_clues'

Modifying the call from get_clues() to get_clue() doesn't work either because the number of arguments doesn't match.

Make repo public

Checklist:

  • Review open-source license
  • Change repo name?
  • Contact game owner?
  • How to acknowledge game designer?

Adjust learning rate for each pass

I just realized that I am using the same learning rates for each pass, i.e., the default linear decrease from 0.025 to 0.0001 over 10 iterations. This means that there is no benefit from additional passes with the current implementation!

The advice here is that there is no benefit beyond 20-30 total iterations, so aim for 25 in 5 passes, n=1-5, with:

alpha = 0.025 - (n-1) * 0.005 + 0.0001

fetch_corpus_text.py always stalls

When running fetch_corpus_text.py, each process starts fetching the first few articles is has been assigned to, but after a few dozens, it stalls and never ends. The article where it happens changes each time.

Things I've tried without success :

  • not using multiprocessing. Even when calling directly fetch() for each clue word serially, it always stalls somewhere in the fetching of the articles for Africa (the first clue alphabetically).
  • pausing for time.sleep(0.5) after each article is fetched, to be nicer to the system. same problem happens.

Most of the time, it happens in the request sent by content = page.content (line 70), but I've seen it blocked once in the request sent by page = wikipedia.page(...) (line 68).
Debugging is not praticable since the article for which it will happen doesn't seem deterministic.

I've added logging like this :

    from httplib import HTTPConnection
    HTTPConnection.debuglevel = 1

    logging.basicConfig() # you need to initialize logging, otherwise you will not see anything from requests
    logging.getLogger().setLevel(logging.DEBUG)
    requests_log = logging.getLogger("requests.packages.urllib3")
    requests_log.setLevel(logging.DEBUG)
    requests_log.propagate = True

This produces a lot of logs which I skip, but here is when it stalls :

DEBUG:requests.packages.urllib3.connectionpool:"GET /w/api.php?inprop=url&redirects=&format=json&ppprop=disambiguation&prop=info%7Cpageprops&titles=1930+FIFA+World+Cup&action=query HTTP/1.1" 200 284
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): en.wikipedia.org
send: 'GET /w/api.php?format=json&rvprop=ids&prop=extracts%7Crevisions&titles=1930+FIFA+World+Cup&action=query&explaintext= HTTP/1.1\r\nHost: en.wikipedia.org\r\nConnection: keep-alive\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: wikipedia (https://github.com/goldsmith/Wikipedia/)\r\n\r\n'

That last line (starting with "send:") is the end of the logs.

Strangely I didn't encounter this problem with the old version of build_corpus.py, which used the same library in a very similar way.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.