phbradley / tcr-dist Goto Github PK

View Code? Open in Web Editor NEW

77.0 77.0 37.0 458 KB

Software tools for the analysis of epitope-specific T cell receptor (TCR) repertoires (scroll down for the README)

License: MIT License

Python 100.00%

tcr-dist's People

Contributors

Stargazers

Watchers

tcr-dist's Issues

Track the clonality of out of frame CDR3s

I think currently we exclude all out of frame CDR3s from any part of the analysis after sequence parsing. It's good to exclude these from analysis since they are non-functional, but it is important to track their clonality, as they are a correlate of clonality signals for functional CDR3s.

make_10x_clones_file.py issue

When this python script invokes file_converter.py, I guess it may use the wrong--check_genes flag cause it is dont_check_genes in file_converter.py.

Reduce and annotate the required fields in the clones file

usage with mixcr files

Dear Phil,
After reading your paper I would be interested in checking identifying shared motifs in a particular subset of sorted T cells that undergo strong TCR signaling during their development in thymus. I'm quite new with python and I have been able to set up the enviroment and installl packages and all dependecies in windows using WSL.
Unfortunately I have only single chain files (alpha and beta) from several subjects. Is there a way I can adapt the input to perform the analysis only with single chain files?
Thanks in advance for your help :)

Guillem Sanchez

make_really_tall_trees.py script always running

Hi!dear team,
the make_really_tall_trees.py script always running,but there is no error message,see as fellow:

my input file has 11762 clones,do I have too much clones? the script has been running for two days,any good suggestions? thanks

Amino Acid sequence inputs

Can tcrdist take cdr3 amino acid sequences as inputs or does it only take the complete nucleotide sequence covering cdr1,2, and 3?

setup_gammadelta_db.py issue

Hi there,
When I setup gammadelta database, I found in line 66 still set your own path. It might better provide file in github.
Best,
Yan

statistical method for testing degree of clustering/distance between groups of TCRs?

Hi there,
Great work on the package!
I've been using tcrdist3 to compute pairwise distance between my tcr sequences. I want to say that one group of tcr sequences are more clustered (more close together, more similar) than another group. How best could I go about testing this? Should I go from the network and calculate say number of nodes for each group, or from the distance matrix and use PERMANOVA or something similar?

Validation Data Missing Epitope Label

The current file with the validation cohort does not have the epitope label. It instead has EPI for all clones.

Allow for missing quality strings, and also ASCII num->char formatted quality strings

Add universal newline support

Universal newline support was removed when the JCC cleaning code was removed. Need to add universal newline support back.

Feature request: save PCs from kPCA to log

Problems determining best_gappos

I've noticed that in some situations the align_cdr3s function in tcr_distances.py will fail to find the best_gappos, causing an error. However, re-running the same dataset, I was able to get it to succeed about 20% of the time. I was unable to find any consistency in the center_cdr3 and member_cdr3 pairing that caused this to fail. Examples include:
DVGYKL DPAGNTGKL
GEGSNNRI GYNTNTGKL
GDRYAQGL GDVDYAQGL

But in each of these cases, rerunning the code eventually resulted in getting through these cases without issue. I assume the stochasticity is introduced by the random_seend. Any thoughts on how to best address this @phbradley ?

Traceback (most recent call last):
File "/mnt/Data/TCR_Git/public_pipeline/tcr-dist/make_tall_trees.py", line 683, in
a,b = align_cdr3s( center_cdr3, member_cdr3, gap_character )
File "/mnt/Data/TCR_Git/public_pipeline/tcr-dist/tcr_distances.py", line 98, in align_cdr3s
s0 = s0[:best_gappos+1] + gap_character*lendiff + s0[best_gappos+1:]
UnboundLocalError: local variable 'best_gappos' referenced before assignment

Limitations in detecting out of frame TCRs

It is currently possible for sequences that are missing required parts of certain gene segments to evade the parts of our pipeline that filter out of frame TCRs. For instance: AAGGCCCTGCCCAGCTAATCTTAATACGTTCAAATGAGCGAGAGAAGCGCAGTGGAAGACTCAGAGCCACCCTTGACACTTCCAGCCAGAGCAGCTCCCTGTCCATCACTGGTACTCTAGCTACAGACACTGCTGTGTACTTCTGTGCTACTGATAAGGCTGGAGGACTAAGTGACATCCAGAACCCAGAACCTGCTGTGTACTGACACCCCAGATCGGAAGAGCGTCGTGTAGGGAAAGAG produces what the pipeline considers a valid TCR, but it is clear that a portion of the J segment is missing. I do not currently believe that this is a widespread issue when the quality of the data is good; however, certain bulk-sequencing approaches that some use to prepare data for TCRdist use assembly algorithms in the process, and this assembly can introduce errors that TCRdist should be able to identify as problematic during parsing. (This particular sequence was generated by MiXCR.)

Single Chain parsed_seq Input Generating Blank Output File

Hello,

I am trying to run TCR-dist on a data set of parsed TCR alpha chains.

An abridged version of my data set is here in .txt format:
clones_file.txt

The code I used to run the basic analysis script is as follows:
python /Users/cajames2/tcr-dist/run_basic_analysis.py --organism human --parsed_seqs_file /Users/cajames2/TCRSeq/clones_file.tsv --make_fake_beta --make_fake_quals

The script then runs all the way through, but returns blank tables and plots. I ran the test "test_small_human_pairseqs_v1_parsed_seqs.tsv " data set and saw outputs. I also deleted beta chain columns and quality scores and ran only the alpha chain information with --make_fake_beta and --make_fake_quals and it worked just fine.

I think I have traced the issue to something that the parse_tsv_file function is dependent on. I modified the parse_tsv.py script to print the all_clones file so that I could see whether my data was being read correctly and this file is blank after I run the run_basic_analysis.py script. However, when I run the parse_tsv.py script on my data independently, it reads my data and generates a populated all_clones file.

Do you have any insight into why the parse_tsv_file function won't read my data when run in the context of the run_basic_analysis.py script, but works just fine when run independently?

text format cluster output for make_tall_trees.py

Hi!dear team,
Using the tcr-dist plug-in find a surprise result that i really care about,especially the result of make_tall_trees.py,but make_tall_trees.py just output a figure output,as shown below:

in this fingure , the seq-logo sub-cluster result in the left is really i want , Can this result be output to text cluster by cluster? Looking forward to your reply,thanks

How to use tcr-dist with only V gene-level information?

Hello,

I currently only have V gene-level information and the first few rows of my "parsed_seqs_file" looks like this table below. My error comes from the "all_genes.py" script not able to find the gene in this file "alphabeta_db.tsv" because it doesn't have the allele information (example: TRAV26-1 not found, when expecting something like TRAV26-1*01). Is there a way to run tcr-dist without this information? Thank you so much for any advice.

id	epitope	subject	va_gene	ja_gene	vb_gene	jb_gene	cdr3a	cdr3a_nucseq	cdr3b	cdr3b_nucseq	va_reps	ja_reps	vb_reps	jb_reps	va_countreps	ja_countreps	vb_countreps	jb_countreps	cdr3a_quals	cdr3b_quals	va_genes	vb_genes	ja_genes	jb_genes	va_rep	ja_rep	vb_rep	jb_rep
345	Nef	5	TRAV26-1	TRAJ43	TRBV13	TRBJ1-5	CIVRAPGRADMRF	TGCATTGTGCGCGCGCCGGGCCGCGCGGATATGCGCTTT	CASSYLPGQGDHYSNQPQHF	TGCGCGAGCAGCTATCTGCCGGGCCAGGGCGATCATTATAGCAACCAGCCGCAGCATTTT	TRAV26-1	TRAJ43	TRBV13	TRBJ1-5	TRAV26-1	TRAJ43	TRBV13	TRBJ1-5	99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99	99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99	TRAV26-1	TRBV13	TRAJ43	TRBJ1-5	TRAV26-1	TRAJ43	TRBV13	TRBJ1-5
3871	p65	1	TRAV8-6	TRAJ30	TRBV28	TRBJ2-7	CAVSDKNRDDKIIF	TGCGCGGTGAGCGATAAAAACCGCGATGATAAAATTATTTTT	CASRPGTASYEQYF	TGCGCGAGCCGCCCGGGCACCGCGAGCTATGAACAGTATTTT	TRAV8-6	TRAJ30	TRBV28	TRBJ2-7	TRAV8-6	TRAJ30	TRBV28	TRBJ2-7	99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99	99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99	TRAV8-6	TRBV28	TRAJ30	TRBJ2-7	TRAV8-6	TRAJ30	TRBV28	TRBJ2-7
3740	p65	104	TRAV3	TRAJ12	TRBV7-9	TRBJ2-7	CATVSRMDSSYKLIF	TGCGCGACCGTGAGCCGCATGGATAGCAGCTATAAACTGATTTTT	CASSLIGEGTGWHQYF	TGCGCGAGCAGCCTGATTGGCGAAGGCACCGGCTGGCATCAGTATTTT	TRAV3	TRAJ12	TRBV7-9	TRBJ2-7	TRAV3	TRAJ12	TRBV7-9	TRBJ2-7	99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99	99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99	TRAV3	TRBV7-9	TRAJ12	TRBJ2-7	TRAV3	TRAJ12	TRBV7-9	TRBJ2-7
3742	p65	106	TRAV16	TRAJ26	TRBV3-1	TRBJ1-1	CADYYGQNFVF	TGCGCGGATTATTATGGCCAGAACTTTGTGTTT	CASSFQGYTEAFF	TGCGCGAGCAGCTTTCAGGGCTATACCGAAGCGTTTTTT	TRAV16	TRAJ26	TRBV3-1	TRBJ1-1	TRAV16	TRAJ26	TRBV3-1	TRBJ1-1	99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99	99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99	TRAV16	TRBV3-1	TRAJ26	TRBJ1-1	TRAV16	TRAJ26	TRBV3-1	TRBJ1-1

My script run is this:

python2 run_basic_analysis.py --organism human --parsed_seqs_file parsed_seqs_file --make_fake_quals --no_probabilities > parsed_seqs_file.out

-Gabrielle

Apply tcr-dist on amino acid sequences

Hello,

Thanks for developing such great tools!

I am wondering if there is a way to apply tcr-dist on cdr3 amino acid sequences.
Currently, the instructions say it is required "nucleotide sequences of the TCR alpha and beta chain reads". However, we got a dataset with amino acid sequences instead of nucleotide sequences and would like to try your methods.

Thanks so much for your help in advance!

Best regards,
Ruoxing

Estimation of TCRdiv

Good evening,
after reading your paper I would be interested in estimating TCRdiv of a series of epitope-specific repertoires, however I do not find a script dedicated only to it without the need of running the full analysis. Could you please help me with this? Does one need also the alpha-chain to estimate TCRdiv?
Many thanks, best regards,
Barbara

Investigate redesigning of filename/path lengths

"The Windows API imposes a maximum filename length such that a filename, including the file path to get to the file, can't exceed between 255-260 characters." Filenames that violate this limit can cause problems for users on Windows machines. Some of the path/filename combinations generated automatically by the pipeline get quite long. We may want to shorten names where possible and/or include information on placing results in a base filepath for viewing when they encounter problems.

Losing Clones

Hello,

I am running TCRdist, and I really like it! Thank you for taking the time to make it.

However, I am noticing that I am losing a considerable amount of my TCRs. As in I have ~2600 clones with complete vdj and cdr3 information, but I only end up with a distance matrix of around 200 clones.

What type of filtering is occurring to make me lose so many clones when running TCRdist. These cells are sorted on tetremer positive cells and also have at least two cells per clone. I also did my own quality control to ensure that they have all information needed. What is causing me to lose so many clones?

Thanks!

How to pair the alpha and beta chain reads from one TCR

Dear Dr. Philip Bradley,

Your work (TCRdist) published in Nature is so interesting that I want to apply it on our TCR data analysis.

As your GitHub described, The input file of TCR-dist has three mode, pair_seq_file, parsed_seq_file and clones_file. Due to some sequencing methods reasons, it is hard for me to pair the alpha and beta chain reads from one TCR.

I just want to try another two input mode because we had used MIXCR to produce some result (MIXCR output).

Could you tell me how to prepare another two input mode file, please?

Best,
Baifeng

Catch special-case failures caused by only having one epitope, subject, etc

Issue with run_basic_analysis

Hello,
I've run into an issue with the run_basic_analysis function.
It results in this error.

Could I have assistance with this?

Thanks,
Tejas

AssertionError

Hello,
When I run from my sequence file,there will be an AssertionError like this:

I am new in python.Could you please tell me where is the problem?Thank you very much!

-Zuo Xinyi

Bad CPU type

I am using Python 2.7 under conda environment to run TCRdist on my Mac Catalina 10.15.4 64bit machine. I am able to run "python setup.py" but when i use "python run_basic_analysis.py" i get below message

****/bioinformatics_tools/tcr-dist-master/external/blast-2.2.16/bin/formatdb -p F -i ***/bioinformatics_tools/tcr-dist-master/db/alphabeta_db.tsv_files/blast_dbs/nucseq_human_A_V.fasta
sh: ***/bioinformatics_tools/tcr-dist-master/external/blast-2.2.16/bin/formatdb: Bad CPU type in executable
blast db creation failed!

Also i couldn't install sklearn as it only for python3 and above.

Can you please help fix this problem?

Limited functionality of --no_probabilities

The use of --no_probabilities is currently limited by the fact that compute_probs imports tcr_rearrangement_new.py or tcr_rearrangement.py (both directly and also through the importing of tcr_sampler.py). the rearrangement scripts have functions that require the existence of hard-coded probabilities files. Currently, --no_probabilities will successfully assign a probability of 1 in all cases, but the pipeline will fail in cases in which the probabilities files do not exist.

phbradley / tcr-dist Goto Github PK

tcr-dist's People

Contributors

Stargazers

Watchers

Forkers

tcr-dist's Issues

Recommend Projects

Recommend Topics

Recommend Org