I’ve been using the IGoR software to generate simulated T cell beta chain repertoires and ran into an issue that I believe is related to the random number generation. My colleagues at Vanderbilt University Medical Center, Cinque Soto and James Crowe, suggested I get in touch with you.
We need to generate very large repertoires to compare with the results obtained from deep immunome sequencing being carried out at Vanderbilt. When I was processing IGoR output, I noticed that the same results are generated every 52,895,649 sequences. We discovered this when calculating the number of unique clonotypes as a function of the number of sequences – after 52M+ reads, the number of clonotypes was unchanged.
Here’s the igor command line that we’ve been using. Note that I tried this with two different seeds and get the same behavior and same cycle length (52,895,649 sequences).
igor -threads -1 -set_wd $PWD -batch tcr_beta -species human -chain beta
-generate 1100000000 --CDR3 --seed 12345678 --noerr
For reasons that I don’t understand, the first few sequences are not repeated, but after a short time the random number generator settles into a predictable cycle.
The example grep output below shows how the 750th sequence is repeated. As you probably know, the “-n” option to grep prefixes the output with the line number for the matching line. You’ll notice that the difference in line numbers is always a multiple of 52,895,649 (e.g. 1,057,913,732 - 752 = 52,895,649 x 20)
$ grep -n 'TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT' \ generated_seqs_noerr_CDR3_info.csv
752:750,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
52896401:52896399,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
105792050:105792048,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
158687699:158687697,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
211583348:211583346,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
264478997:264478995,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
317374646:317374644,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
370270295:370270293,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
423165944:423165942,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
476061593:476061591,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
528957242:528957240,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
581852891:581852889,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
634748540:634748538,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
687644189:687644187,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
740539838:740539836,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
793435487:793435485,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
846331136:846331134,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
899226785:899226783,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
952122434:952122432,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
1005018083:1005018081,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
1057913732:1057913730,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
$ grep -n '(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)' \ generated_realizations_noerr.csv
752:750;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
52896401:52896399;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
105792050:105792048;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
158687699:158687697;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
211583348:211583346;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
264478997:264478995;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
317374646:317374644;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
370270295:370270293;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
423165944:423165942;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
476061593:476061591;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
528957242:528957240;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
581852891:581852889;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
634748540:634748538;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
687644189:687644187;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
740539838:740539836;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
793435487:793435485;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
846331136:846331134;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
899226785:899226783;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
952122434:952122432;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
1005018083:1005018081;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
1057913732:1057913730;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
Any help would be greatly appreciated. Let me add that this is fantastic software. We’ve been looking for a rigorous way to generate simulated repertoires and we’re convinced that IGoR is the right way to do this.