The current prisonersDilemma.py only pits each

Only running each pairing once could result in too much randomness affecting results about prisonersdilemmatournament HOT 15 OPEN

carykh commented on September 21, 2024 7

Only running each pairing once could result in too much randomness affecting results

from prisonersdilemmatournament.

Comments (15)

MarcHenriot commented on September 21, 2024 5

That's right, I modified the runFullPairingTournament function for those who are interested in this kind of technique.

def runFullPairingTournamentN(inFolder=STRATEGY_FOLDER, n=1000):
  finalScoreKeeper = {}
  STRATEGY_LIST = []
  
  for file in os.listdir(inFolder):
      if file.endswith(".py"):
          STRATEGY_LIST.append(file[:-3])
  
  for strategy in STRATEGY_LIST:
      finalScoreKeeper[strategy] = 0

  for _ in tqdm(range(n)):
      scoreKeeper = {}
      for strategy in STRATEGY_LIST:
          scoreKeeper[strategy] = 0
          
      for pair in itertools.combinations(STRATEGY_LIST, r=2):
          roundHistory = runRound(pair)
          scoresA, scoresB = tallyRoundScores(roundHistory)
          scoreKeeper[pair[0]] += scoresA
          scoreKeeper[pair[1]] += scoresB

      for strategy in STRATEGY_LIST:
          finalScoreKeeper[strategy] += scoreKeeper[strategy]
          
  scoresNumpy = np.zeros(len(finalScoreKeeper))
  for i in range(len(STRATEGY_LIST)):
      scoresNumpy[i] = finalScoreKeeper[STRATEGY_LIST[i]]
  rankings = np.argsort(scoresNumpy)

  for rank in range(len(STRATEGY_LIST)):
      i = rankings[-1-rank]
      score = scoresNumpy[i]
      scoreTrour = score/n
      scorePer = scoreTrour/(len(STRATEGY_LIST)-1)
      print(f'#{pad(str(rank+1)+":", 3)} {pad(STRATEGY_LIST[i]+":", 18)} {score:f} ({scoreTrour:f} avg per tournament) ({scorePer:f} avg per match)')

from prisonersdilemmatournament.

nobody5050 commented on September 21, 2024 2

I’ve been using a modified version of that function myself for testing, but this one looks much cleaner. Thanks!

from prisonersdilemmatournament.

Bip901 commented on September 21, 2024

Agreed; My submission consistently ranks 1st when running a few times and averaging the results, but sometimes ranks as low as 3rd when running only once.

from prisonersdilemmatournament.

carykh commented on September 21, 2024

Thanks for pointing this out! Yeah, quite a few people noticed how much randomness had an effort on the leaderboard's ordering. I will probably do something like @peopleOfPalay's edit where I run the tournament over and over and take the average of all of them. I'll try to do 1,000 iterations, but I hope the code doesn't take too long to run! (Since it is O(n^2) for the number of submissions)

from prisonersdilemmatournament.

valadaptive commented on September 21, 2024

You could take advantage of multiprocessing. I can create a PR for that if you want!

from prisonersdilemmatournament.

TimStanglow commented on September 21, 2024

Well, as the number of matches each algorith plays increases the actual effect of randomness goes down. So the more participants there are the less iterations are needed.
But since we're already talking performance, you could accept my Pull Request that improves it significatly.

from prisonersdilemmatournament.

EFHIII commented on September 21, 2024

For matches between two strategies that don't use random, repeated matches will always have the same outcome, so you could try, for example, running matches 10 times, and if the first 200 moves are the same for both players all 10 times, you can take the average from just those 10, otherwise you know RNG played a part and you can then repeat 990 more times.

This would significantly reduce the time it takes to run without significantly changing the results compared to running it 1000 times.

There are a few exceptions where strategies using RNG could falsely be flagged as not using RNG using the above, for example, strategies that only use RNG after the 200th move or strategies that have a small chance of RNG of changing things; maybe 0.05% of the time, they defect on the 1st move or something.

Another method you could use is checking of the source code includes the word "random" since they're probably not using RNG if they don't include random. This would instead have false positives instead of false negatives, which might be preferred. It's still probably a good idea to run matches not using RNG a few times since the match length is random, and that can effect the score.

from prisonersdilemmatournament.

duckboycool commented on September 21, 2024

For matches between two strategies that don't use random, repeated matches will always have the same outcome, so you could try, for example, running matches 10 times, and if the first 200 moves are the same for both players all 10 times, you can take the average from just those 10, otherwise you know RNG played a part and you can then repeat 990 more times.

This would significantly reduce the time it takes to run without significantly changing the results compared to running it 1000 times.

There are a few exceptions where strategies using RNG could falsely be flagged as not using RNG using the above, for example, strategies that only use RNG after the 200th move or strategies that have a small chance of RNG of changing things; maybe 0.05% of the time, they defect on the 1st move or something.

Another method you could use is checking of the source code includes the word "random" since they're probably not using RNG if they don't include random. This would instead have false positives instead of false negatives, which might be preferred. It's still probably a good idea to run matches not using RNG a few times since the match length is random, and that can effect the score.

I think it would be best to check for whether the random module was imported in the strategy. While this could lead to false negatives if somebody uses randomness not from the random module (such as numpy.random), you could potentially fix this by adding the import random to any strategies that used randomness without a random import, since all the files have to be checked anyway. You could also remove random imports if a strategy doesn't use it to decrease false positive rates.

After removing unnecessary imports, I found that a basic cache with a dictionary gave a 3-4x speedup for multiple iterations on my test set with a few dozen strategies.

from prisonersdilemmatournament.

nobody5050 commented on September 21, 2024

For matches between two strategies that don't use random, repeated matches will always have the same outcome, so you could try, for example, running matches 10 times, and if the first 200 moves are the same for both players all 10 times, you can take the average from just those 10, otherwise you know RNG played a part and you can then repeat 990 more times.
This would significantly reduce the time it takes to run without significantly changing the results compared to running it 1000 times.
There are a few exceptions where strategies using RNG could falsely be flagged as not using RNG using the above, for example, strategies that only use RNG after the 200th move or strategies that have a small chance of RNG of changing things; maybe 0.05% of the time, they defect on the 1st move or something.
Another method you could use is checking of the source code includes the word "random" since they're probably not using RNG if they don't include random. This would instead have false positives instead of false negatives, which might be preferred. It's still probably a good idea to run matches not using RNG a few times since the match length is random, and that can effect the score.

I think it would be best to check for whether the random module was imported in the strategy. While this could lead to false negatives if somebody uses randomness not from the random module (such as numpy.random), you could potentially fix this by adding the import random to any strategies that used randomness without a random import, since all the files have to be checked anyway. You could also remove random imports if a strategy doesn't use it to decrease false positive rates.

After removing unnecessary imports, I found that a basic cache with a dictionary gave a 3-4x speedup for multiple iterations on my test set with a few dozen strategies.

Some strategies might import random but not actually use it though

from prisonersdilemmatournament.

duckboycool commented on September 21, 2024

Some strategies might import random but not actually use it though

You could also remove random imports if a strategy doesn't use it to decrease false positive rates.

After removing unnecessary imports, I found that a basic cache with a dictionary gave a 3-4x speedup

from prisonersdilemmatournament.

Bip901 commented on September 21, 2024

Guys, don't forget that the game length is random. That can have an effect even on a match between two non-random strategies. You can't do that optimization.

from prisonersdilemmatournament.

EFHIII commented on September 21, 2024

Guys, don't forget that the game length is random. That can have an effect even on a match between two non-random strategies. You can't do that optimization.

I didn't forget, I mentioned it at the end;

It's still probably a good idea to run matches not using RNG a few times since the match length is random, and that can effect the score.

The difference that makes compared to other forms of RNG is probably going to be far less noticeable. Historically, a lot of strategies will cooperate until defected against, and in that situation, the score is 3 for both sides no matter how long the match is. If you want to be as cautious about that type of RNG as possible, you could check if across a few rounds the score for both stays within a small range (+/- 3% or so) and if it does, it's pretty safe to just average from there and move on.

from prisonersdilemmatournament.

duckboycool commented on September 21, 2024

If you want to be completely sure that game length randomness is the same, what you could do cache the entire game history and memories, then generate more of the round when the game length is longer than the length of the history stored in cache, and just return the first amount of rounds from it when it isn't. This should produce exactly the same results as outright running the strategies the full amount of times, and would be quicker than rerunning even just a few times.

This could also start running into issues with memory on a larger set, but if this becomes a problem you could try to store the data in a way where each value only uses 1 bit instead of 32.

from prisonersdilemmatournament.

EFHIII commented on September 21, 2024

If Cary plans to run the code multiple times, yes, building a cache is ideal, but otherwise, it's a waste of time and resources.

In the fork I'm working on, we're implementing a cache, but I see no need for Cary to use one if he only plans to run things once.

from prisonersdilemmatournament.

duckboycool commented on September 21, 2024

If Cary plans to run the code multiple times, yes, building a cache is ideal, but otherwise, it's a waste of time and resources.

In the fork I'm working on, we're implementing a cache, but I see no need for Cary to use one if he only plans to run things once.

They won't just be run once though.

I will probably do something like peopleOfPalay's edit where I run the tournament over and over and take the average of all of them.

Also, as per the idea I said before, continuing a game from cached history could be problematic for consistency, so it would probably be better to precompute the cache to 500 turns or so once, and then run it from the start again in the unlikely case that the game's longer than that.

from prisonersdilemmatournament.

Only running each pairing once could result in too much randomness affecting results about prisonersdilemmatournament HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent