Giter Site home page Giter Site logo

tcr-dist's People

Contributors

agartland avatar ddiez avatar jeremycfd avatar pansapiens avatar phbradley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tcr-dist's Issues

Track the clonality of out of frame CDR3s

I think currently we exclude all out of frame CDR3s from any part of the analysis after sequence parsing. It's good to exclude these from analysis since they are non-functional, but it is important to track their clonality, as they are a correlate of clonality signals for functional CDR3s.

make_10x_clones_file.py issue

When this python script invokes file_converter.py, I guess it may use the wrong--check_genes flag cause it is dont_check_genes in file_converter.py.

usage with mixcr files

Dear Phil,
After reading your paper I would be interested in checking identifying shared motifs in a particular subset of sorted T cells that undergo strong TCR signaling during their development in thymus. I'm quite new with python and I have been able to set up the enviroment and installl packages and all dependecies in windows using WSL.
Unfortunately I have only single chain files (alpha and beta) from several subjects. Is there a way I can adapt the input to perform the analysis only with single chain files?
Thanks in advance for your help :)

Guillem Sanchez

make_really_tall_trees.py script always running

Hi!dear team,
the make_really_tall_trees.py script always running,but there is no error message,see as fellow:
图片

my input file has 11762 clones,do I have too much clones? the script has been running for two days,any good suggestions? thanks

Amino Acid sequence inputs

Can tcrdist take cdr3 amino acid sequences as inputs or does it only take the complete nucleotide sequence covering cdr1,2, and 3?

setup_gammadelta_db.py issue

Hi there,
When I setup gammadelta database, I found in line 66 still set your own path. It might better provide file in github.
Best,
Yan

statistical method for testing degree of clustering/distance between groups of TCRs?

Hi there,
Great work on the package!
I've been using tcrdist3 to compute pairwise distance between my tcr sequences. I want to say that one group of tcr sequences are more clustered (more close together, more similar) than another group. How best could I go about testing this? Should I go from the network and calculate say number of nodes for each group, or from the distance matrix and use PERMANOVA or something similar?

Add universal newline support

Universal newline support was removed when the JCC cleaning code was removed. Need to add universal newline support back.

Problems determining best_gappos

I've noticed that in some situations the align_cdr3s function in tcr_distances.py will fail to find the best_gappos, causing an error. However, re-running the same dataset, I was able to get it to succeed about 20% of the time. I was unable to find any consistency in the center_cdr3 and member_cdr3 pairing that caused this to fail. Examples include:
DVGYKL DPAGNTGKL
GEGSNNRI GYNTNTGKL
GDRYAQGL GDVDYAQGL

But in each of these cases, rerunning the code eventually resulted in getting through these cases without issue. I assume the stochasticity is introduced by the random_seend. Any thoughts on how to best address this @phbradley ?

Traceback (most recent call last):
File "/mnt/Data/TCR_Git/public_pipeline/tcr-dist/make_tall_trees.py", line 683, in
a,b = align_cdr3s( center_cdr3, member_cdr3, gap_character )
File "/mnt/Data/TCR_Git/public_pipeline/tcr-dist/tcr_distances.py", line 98, in align_cdr3s
s0 = s0[:best_gappos+1] + gap_character*lendiff + s0[best_gappos+1:]
UnboundLocalError: local variable 'best_gappos' referenced before assignment

Limitations in detecting out of frame TCRs

It is currently possible for sequences that are missing required parts of certain gene segments to evade the parts of our pipeline that filter out of frame TCRs. For instance: AAGGCCCTGCCCAGCTAATCTTAATACGTTCAAATGAGCGAGAGAAGCGCAGTGGAAGACTCAGAGCCACCCTTGACACTTCCAGCCAGAGCAGCTCCCTGTCCATCACTGGTACTCTAGCTACAGACACTGCTGTGTACTTCTGTGCTACTGATAAGGCTGGAGGACTAAGTGACATCCAGAACCCAGAACCTGCTGTGTACTGACACCCCAGATCGGAAGAGCGTCGTGTAGGGAAAGAG produces what the pipeline considers a valid TCR, but it is clear that a portion of the J segment is missing. I do not currently believe that this is a widespread issue when the quality of the data is good; however, certain bulk-sequencing approaches that some use to prepare data for TCRdist use assembly algorithms in the process, and this assembly can introduce errors that TCRdist should be able to identify as problematic during parsing. (This particular sequence was generated by MiXCR.)

Single Chain parsed_seq Input Generating Blank Output File

Hello,

I am trying to run TCR-dist on a data set of parsed TCR alpha chains.

An abridged version of my data set is here in .txt format:
clones_file.txt

The code I used to run the basic analysis script is as follows:
python /Users/cajames2/tcr-dist/run_basic_analysis.py --organism human --parsed_seqs_file /Users/cajames2/TCRSeq/clones_file.tsv --make_fake_beta --make_fake_quals

The script then runs all the way through, but returns blank tables and plots. I ran the test "test_small_human_pairseqs_v1_parsed_seqs.tsv " data set and saw outputs. I also deleted beta chain columns and quality scores and ran only the alpha chain information with --make_fake_beta and --make_fake_quals and it worked just fine.

I think I have traced the issue to something that the parse_tsv_file function is dependent on. I modified the parse_tsv.py script to print the all_clones file so that I could see whether my data was being read correctly and this file is blank after I run the run_basic_analysis.py script. However, when I run the parse_tsv.py script on my data independently, it reads my data and generates a populated all_clones file.

Do you have any insight into why the parse_tsv_file function won't read my data when run in the context of the run_basic_analysis.py script, but works just fine when run independently?

text format cluster output for make_tall_trees.py

Hi!dear team,
Using the tcr-dist plug-in find a surprise result that i really care about,especially the result of make_tall_trees.py,but make_tall_trees.py just output a figure output,as shown below:
图片

in this fingure , the seq-logo sub-cluster result in the left is really i want , Can this result be output to text cluster by cluster? Looking forward to your reply,thanks

How to use tcr-dist with only V gene-level information?

Hello,

I currently only have V gene-level information and the first few rows of my "parsed_seqs_file" looks like this table below. My error comes from the "all_genes.py" script not able to find the gene in this file "alphabeta_db.tsv" because it doesn't have the allele information (example: TRAV26-1 not found, when expecting something like TRAV26-1*01). Is there a way to run tcr-dist without this information? Thank you so much for any advice.

id epitope subject va_gene ja_gene vb_gene jb_gene cdr3a cdr3a_nucseq cdr3b cdr3b_nucseq va_reps ja_reps vb_reps jb_reps va_countreps ja_countreps vb_countreps jb_countreps cdr3a_quals cdr3b_quals va_genes vb_genes ja_genes jb_genes va_rep ja_rep vb_rep jb_rep
345 Nef 5 TRAV26-1 TRAJ43 TRBV13 TRBJ1-5 CIVRAPGRADMRF TGCATTGTGCGCGCGCCGGGCCGCGCGGATATGCGCTTT CASSYLPGQGDHYSNQPQHF TGCGCGAGCAGCTATCTGCCGGGCCAGGGCGATCATTATAGCAACCAGCCGCAGCATTTT TRAV26-1 TRAJ43 TRBV13 TRBJ1-5 TRAV26-1 TRAJ43 TRBV13 TRBJ1-5 99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99 99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99 TRAV26-1 TRBV13 TRAJ43 TRBJ1-5 TRAV26-1 TRAJ43 TRBV13 TRBJ1-5
3871 p65 1 TRAV8-6 TRAJ30 TRBV28 TRBJ2-7 CAVSDKNRDDKIIF TGCGCGGTGAGCGATAAAAACCGCGATGATAAAATTATTTTT CASRPGTASYEQYF TGCGCGAGCCGCCCGGGCACCGCGAGCTATGAACAGTATTTT TRAV8-6 TRAJ30 TRBV28 TRBJ2-7 TRAV8-6 TRAJ30 TRBV28 TRBJ2-7 99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99 99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99 TRAV8-6 TRBV28 TRAJ30 TRBJ2-7 TRAV8-6 TRAJ30 TRBV28 TRBJ2-7
3740 p65 104 TRAV3 TRAJ12 TRBV7-9 TRBJ2-7 CATVSRMDSSYKLIF TGCGCGACCGTGAGCCGCATGGATAGCAGCTATAAACTGATTTTT CASSLIGEGTGWHQYF TGCGCGAGCAGCCTGATTGGCGAAGGCACCGGCTGGCATCAGTATTTT TRAV3 TRAJ12 TRBV7-9 TRBJ2-7 TRAV3 TRAJ12 TRBV7-9 TRBJ2-7 99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99 99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99 TRAV3 TRBV7-9 TRAJ12 TRBJ2-7 TRAV3 TRAJ12 TRBV7-9 TRBJ2-7
3742 p65 106 TRAV16 TRAJ26 TRBV3-1 TRBJ1-1 CADYYGQNFVF TGCGCGGATTATTATGGCCAGAACTTTGTGTTT CASSFQGYTEAFF TGCGCGAGCAGCTTTCAGGGCTATACCGAAGCGTTTTTT TRAV16 TRAJ26 TRBV3-1 TRBJ1-1 TRAV16 TRAJ26 TRBV3-1 TRBJ1-1 99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99 99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99.99 TRAV16 TRBV3-1 TRAJ26 TRBJ1-1 TRAV16 TRAJ26 TRBV3-1 TRBJ1-1

My script run is this:

python2 run_basic_analysis.py --organism human --parsed_seqs_file parsed_seqs_file --make_fake_quals --no_probabilities > parsed_seqs_file.out

-Gabrielle

Apply tcr-dist on amino acid sequences

Hello,

Thanks for developing such great tools!

I am wondering if there is a way to apply tcr-dist on cdr3 amino acid sequences.
Currently, the instructions say it is required "nucleotide sequences of the TCR alpha and beta chain reads". However, we got a dataset with amino acid sequences instead of nucleotide sequences and would like to try your methods.

Thanks so much for your help in advance!

Best regards,
Ruoxing

Estimation of TCRdiv

Good evening,
after reading your paper I would be interested in estimating TCRdiv of a series of epitope-specific repertoires, however I do not find a script dedicated only to it without the need of running the full analysis. Could you please help me with this? Does one need also the alpha-chain to estimate TCRdiv?
Many thanks, best regards,
Barbara

Investigate redesigning of filename/path lengths

"The Windows API imposes a maximum filename length such that a filename, including the file path to get to the file, can't exceed between 255-260 characters." Filenames that violate this limit can cause problems for users on Windows machines. Some of the path/filename combinations generated automatically by the pipeline get quite long. We may want to shorten names where possible and/or include information on placing results in a base filepath for viewing when they encounter problems.

Losing Clones

Hello,

I am running TCRdist, and I really like it! Thank you for taking the time to make it.

However, I am noticing that I am losing a considerable amount of my TCRs. As in I have ~2600 clones with complete vdj and cdr3 information, but I only end up with a distance matrix of around 200 clones.

What type of filtering is occurring to make me lose so many clones when running TCRdist. These cells are sorted on tetremer positive cells and also have at least two cells per clone. I also did my own quality control to ensure that they have all information needed. What is causing me to lose so many clones?

Thanks!

How to pair the alpha and beta chain reads from one TCR

Dear Dr. Philip Bradley,

Your work (TCRdist) published in Nature is so interesting that I want to apply it on our TCR data analysis.

As your GitHub described, The input file of TCR-dist has three mode, pair_seq_file, parsed_seq_file and clones_file. Due to some sequencing methods reasons, it is hard for me to pair the alpha and beta chain reads from one TCR.

I just want to try another two input mode because we had used MIXCR to produce some result (MIXCR output).

Could you tell me how to prepare another two input mode file, please?

Best,
Baifeng

Issue with run_basic_analysis

Hello,
I've run into an issue with the run_basic_analysis function.
It results in this error.
image

Could I have assistance with this?

Thanks,
Tejas

AssertionError

Hello,
When I run from my sequence file,there will be an AssertionError like this:

微信图片_20190630163818

I am new in python.Could you please tell me where is the problem?Thank you very much!

-Zuo Xinyi

Bad CPU type

I am using Python 2.7 under conda environment to run TCRdist on my Mac Catalina 10.15.4 64bit machine. I am able to run "python setup.py" but when i use "python run_basic_analysis.py" i get below message

****/bioinformatics_tools/tcr-dist-master/external/blast-2.2.16/bin/formatdb -p F -i ***/bioinformatics_tools/tcr-dist-master/db/alphabeta_db.tsv_files/blast_dbs/nucseq_human_A_V.fasta
sh: ***/bioinformatics_tools/tcr-dist-master/external/blast-2.2.16/bin/formatdb: Bad CPU type in executable
blast db creation failed!

Also i couldn't install sklearn as it only for python3 and above.

Can you please help fix this problem?

Limited functionality of --no_probabilities

The use of --no_probabilities is currently limited by the fact that compute_probs imports tcr_rearrangement_new.py or tcr_rearrangement.py (both directly and also through the importing of tcr_sampler.py). the rearrangement scripts have functions that require the existence of hard-coded probabilities files. Currently, --no_probabilities will successfully assign a probability of 1 in all cases, but the pipeline will fail in cases in which the probabilities files do not exist.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.