Giter Site home page Giter Site logo

Comments (15)

MarcHenriot avatar MarcHenriot commented on September 21, 2024 5

That's right, I modified the runFullPairingTournament function for those who are interested in this kind of technique.

def runFullPairingTournamentN(inFolder=STRATEGY_FOLDER, n=1000):
  finalScoreKeeper = {}
  STRATEGY_LIST = []
  
  for file in os.listdir(inFolder):
      if file.endswith(".py"):
          STRATEGY_LIST.append(file[:-3])
  
  for strategy in STRATEGY_LIST:
      finalScoreKeeper[strategy] = 0

  for _ in tqdm(range(n)):
      scoreKeeper = {}
      for strategy in STRATEGY_LIST:
          scoreKeeper[strategy] = 0
          
      for pair in itertools.combinations(STRATEGY_LIST, r=2):
          roundHistory = runRound(pair)
          scoresA, scoresB = tallyRoundScores(roundHistory)
          scoreKeeper[pair[0]] += scoresA
          scoreKeeper[pair[1]] += scoresB

      for strategy in STRATEGY_LIST:
          finalScoreKeeper[strategy] += scoreKeeper[strategy]
          
  scoresNumpy = np.zeros(len(finalScoreKeeper))
  for i in range(len(STRATEGY_LIST)):
      scoresNumpy[i] = finalScoreKeeper[STRATEGY_LIST[i]]
  rankings = np.argsort(scoresNumpy)

  for rank in range(len(STRATEGY_LIST)):
      i = rankings[-1-rank]
      score = scoresNumpy[i]
      scoreTrour = score/n
      scorePer = scoreTrour/(len(STRATEGY_LIST)-1)
      print(f'#{pad(str(rank+1)+":", 3)} {pad(STRATEGY_LIST[i]+":", 18)} {score:f} ({scoreTrour:f} avg per tournament) ({scorePer:f} avg per match)')

from prisonersdilemmatournament.

nobody5050 avatar nobody5050 commented on September 21, 2024 2

I’ve been using a modified version of that function myself for testing, but this one looks much cleaner. Thanks!

from prisonersdilemmatournament.

Bip901 avatar Bip901 commented on September 21, 2024

Agreed; My submission consistently ranks 1st when running a few times and averaging the results, but sometimes ranks as low as 3rd when running only once.

from prisonersdilemmatournament.

carykh avatar carykh commented on September 21, 2024

Thanks for pointing this out! Yeah, quite a few people noticed how much randomness had an effort on the leaderboard's ordering. I will probably do something like @peopleOfPalay's edit where I run the tournament over and over and take the average of all of them. I'll try to do 1,000 iterations, but I hope the code doesn't take too long to run! (Since it is O(n^2) for the number of submissions)

from prisonersdilemmatournament.

valadaptive avatar valadaptive commented on September 21, 2024

You could take advantage of multiprocessing. I can create a PR for that if you want!

from prisonersdilemmatournament.

TimStanglow avatar TimStanglow commented on September 21, 2024

Well, as the number of matches each algorith plays increases the actual effect of randomness goes down. So the more participants there are the less iterations are needed.
But since we're already talking performance, you could accept my Pull Request that improves it significatly.

from prisonersdilemmatournament.

EFHIII avatar EFHIII commented on September 21, 2024

For matches between two strategies that don't use random, repeated matches will always have the same outcome, so you could try, for example, running matches 10 times, and if the first 200 moves are the same for both players all 10 times, you can take the average from just those 10, otherwise you know RNG played a part and you can then repeat 990 more times.

This would significantly reduce the time it takes to run without significantly changing the results compared to running it 1000 times.

There are a few exceptions where strategies using RNG could falsely be flagged as not using RNG using the above, for example, strategies that only use RNG after the 200th move or strategies that have a small chance of RNG of changing things; maybe 0.05% of the time, they defect on the 1st move or something.

Another method you could use is checking of the source code includes the word "random" since they're probably not using RNG if they don't include random. This would instead have false positives instead of false negatives, which might be preferred. It's still probably a good idea to run matches not using RNG a few times since the match length is random, and that can effect the score.

from prisonersdilemmatournament.

duckboycool avatar duckboycool commented on September 21, 2024

For matches between two strategies that don't use random, repeated matches will always have the same outcome, so you could try, for example, running matches 10 times, and if the first 200 moves are the same for both players all 10 times, you can take the average from just those 10, otherwise you know RNG played a part and you can then repeat 990 more times.

This would significantly reduce the time it takes to run without significantly changing the results compared to running it 1000 times.

There are a few exceptions where strategies using RNG could falsely be flagged as not using RNG using the above, for example, strategies that only use RNG after the 200th move or strategies that have a small chance of RNG of changing things; maybe 0.05% of the time, they defect on the 1st move or something.

Another method you could use is checking of the source code includes the word "random" since they're probably not using RNG if they don't include random. This would instead have false positives instead of false negatives, which might be preferred. It's still probably a good idea to run matches not using RNG a few times since the match length is random, and that can effect the score.

I think it would be best to check for whether the random module was imported in the strategy. While this could lead to false negatives if somebody uses randomness not from the random module (such as numpy.random), you could potentially fix this by adding the import random to any strategies that used randomness without a random import, since all the files have to be checked anyway. You could also remove random imports if a strategy doesn't use it to decrease false positive rates.

After removing unnecessary imports, I found that a basic cache with a dictionary gave a 3-4x speedup for multiple iterations on my test set with a few dozen strategies.

from prisonersdilemmatournament.

nobody5050 avatar nobody5050 commented on September 21, 2024

For matches between two strategies that don't use random, repeated matches will always have the same outcome, so you could try, for example, running matches 10 times, and if the first 200 moves are the same for both players all 10 times, you can take the average from just those 10, otherwise you know RNG played a part and you can then repeat 990 more times.
This would significantly reduce the time it takes to run without significantly changing the results compared to running it 1000 times.
There are a few exceptions where strategies using RNG could falsely be flagged as not using RNG using the above, for example, strategies that only use RNG after the 200th move or strategies that have a small chance of RNG of changing things; maybe 0.05% of the time, they defect on the 1st move or something.
Another method you could use is checking of the source code includes the word "random" since they're probably not using RNG if they don't include random. This would instead have false positives instead of false negatives, which might be preferred. It's still probably a good idea to run matches not using RNG a few times since the match length is random, and that can effect the score.

I think it would be best to check for whether the random module was imported in the strategy. While this could lead to false negatives if somebody uses randomness not from the random module (such as numpy.random), you could potentially fix this by adding the import random to any strategies that used randomness without a random import, since all the files have to be checked anyway. You could also remove random imports if a strategy doesn't use it to decrease false positive rates.

After removing unnecessary imports, I found that a basic cache with a dictionary gave a 3-4x speedup for multiple iterations on my test set with a few dozen strategies.

Some strategies might import random but not actually use it though

from prisonersdilemmatournament.

duckboycool avatar duckboycool commented on September 21, 2024

Some strategies might import random but not actually use it though

You could also remove random imports if a strategy doesn't use it to decrease false positive rates.

After removing unnecessary imports, I found that a basic cache with a dictionary gave a 3-4x speedup

from prisonersdilemmatournament.

Bip901 avatar Bip901 commented on September 21, 2024

Guys, don't forget that the game length is random. That can have an effect even on a match between two non-random strategies. You can't do that optimization.

from prisonersdilemmatournament.

EFHIII avatar EFHIII commented on September 21, 2024

Guys, don't forget that the game length is random. That can have an effect even on a match between two non-random strategies. You can't do that optimization.

I didn't forget, I mentioned it at the end;

It's still probably a good idea to run matches not using RNG a few times since the match length is random, and that can effect the score.

The difference that makes compared to other forms of RNG is probably going to be far less noticeable. Historically, a lot of strategies will cooperate until defected against, and in that situation, the score is 3 for both sides no matter how long the match is. If you want to be as cautious about that type of RNG as possible, you could check if across a few rounds the score for both stays within a small range (+/- 3% or so) and if it does, it's pretty safe to just average from there and move on.

from prisonersdilemmatournament.

duckboycool avatar duckboycool commented on September 21, 2024

If you want to be completely sure that game length randomness is the same, what you could do cache the entire game history and memories, then generate more of the round when the game length is longer than the length of the history stored in cache, and just return the first amount of rounds from it when it isn't. This should produce exactly the same results as outright running the strategies the full amount of times, and would be quicker than rerunning even just a few times.

This could also start running into issues with memory on a larger set, but if this becomes a problem you could try to store the data in a way where each value only uses 1 bit instead of 32.

from prisonersdilemmatournament.

EFHIII avatar EFHIII commented on September 21, 2024

If Cary plans to run the code multiple times, yes, building a cache is ideal, but otherwise, it's a waste of time and resources.

In the fork I'm working on, we're implementing a cache, but I see no need for Cary to use one if he only plans to run things once.

from prisonersdilemmatournament.

duckboycool avatar duckboycool commented on September 21, 2024

If Cary plans to run the code multiple times, yes, building a cache is ideal, but otherwise, it's a waste of time and resources.

In the fork I'm working on, we're implementing a cache, but I see no need for Cary to use one if he only plans to run things once.

They won't just be run once though.

I will probably do something like peopleOfPalay's edit where I run the tournament over and over and take the average of all of them.

Also, as per the idea I said before, continuing a game from cached history could be problematic for consistency, so it would probably be better to precompute the cache to 500 turns or so once, and then run it from the start again in the unlikely case that the game's longer than that.

from prisonersdilemmatournament.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.