I am actually trying to run stitchr
inside a Shiny R app, for which I use reticulate
to run it inside a conda environment.
For that, I do the following:
reticulate::use_condaenv('myenv')
reticulate::py_install("stitchr", pip = TRUE)
Then I create a run_stitchr.py
script as detailed in https://jamieheather.github.io/stitchr/importing.html
run_stitchr.py
looks exactly like the one in the link above:
# import stitchr
from Stitchr import stitchrfunctions as fxn
from Stitchr import stitchr as st
# specify details about the locus to be stitched
chain = 'TRB'
species = 'HUMAN'
# initialise the necessary data
tcr_dat, functionality, partial = fxn.get_imgt_data(chain, st.gene_types, species)
codons = fxn.get_optimal_codons('', species)
# provide details of the rearrangement to be stitched
tcr_bits = {'v': 'TRBV7-3*01', 'j': 'TRBJ1-1*01', 'cdr3': 'CASSYLQAQYTEAFF',
'l': 'TRBV7-3*01', 'c': 'TRBC1*01',
'skip_c_checks': False, 'species': species, 'seamless': False,
'5_prime_seq': '', '3_prime_seq': '', 'name': 'TCR'}
# then run stitchr on that rearrangement
stitched = st.stitch(tcr_bits, tcr_dat, functionality, partial, codons, 3, '')
print(stitched)
and I run it as follows:
run_results <- reticulate::py_run_file(run_script)
This works perfectly, and run_results$stitched
contains the following output:
[[1]]
[1] "TCR" "TRBV7-3*01" "TRBJ1-1*01" "TRBC1*01" "CASSYLQAQYTEAFF"
[6] "TRBV7-3*01(L)"
[[2]]
[1] "ATGGGCACCAGGCTCCTCTGCTGGGCAGCCCTGTGCCTCCTGGGGGCAGATCACACAGGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGGGAAAATATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAAAGCCTGGGGCAGGGCCCAGAGTTTCTAATTTACTTCCAAGGCACGGGTGCGGCAGATGACTCAGGGCTGCCCAACGATCGGTTCTTTGCAGTCAGGCCTGAGGGATCCGTCTCTACTCTGAAGATCCAGCGCACAGAGCGGGGGGACTCAGCCGTGTATCTCTGTGCCAGCAGCTACCTGCAGGCCCAGTACACTGAAGCTTTCTTTGGACAAGGCACCAGACTCACAGTTGTAGAGGACCTGAACAAGGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTCCCACACCCAAAAGGCCACACTGGTGTGCCTGGCCACAGGCTTCTTCCCCGACCACGTGGAGCTGAGCTGGTGGGTGAATGGGAAGGAGGTGCACAGTGGGGTCAGCACGGACCCGCAGCCCCTCAAGGAGCAGCCCGCCCTCAATGACTCCAGATACTGCCTGAGCAGCCGCCTGAGGGTCTCGGCCACCTTCTGGCAGAACCCCCGCAACCACTTCCGCTGTCAAGTCCAGTTCTACGGGCTCTCGGAGAATGACGAGTGGACCCAGGATAGGGCCAAACCCGTCACCCAGATCGTCAGCGCCGAGGCCTGGGGTAGAGCAGACTGTGGCTTTACCTCGGTGTCCTACCAGCAAGGGGTCCTGTCTGCCACCATCCTCTATGAGATCCTGCTAGGGAAGGCCACCCTGTATGCTGTGCTGGTCAGCGCCCTTGTGTTGATGGCCATGGTCAAGAGAAAGGATTTC"
[[3]]
[1] 0
Now here I have 2 questions:
1- using this approach, is there a possibility to obtain the aminoacid stitched sequence instead (similarly to command-line stitchr
, that returns both DNA and aa sequences)?
2- I don't have C region information, but if I just write 'c': ''
in the tcr_bits
section of the run_stitchr.py
above, I get the following:
Error: a CONSTANT sequence region has not been found for gene in the IMGT data for this chain/species. Please check your TCR and species data.
No C region information does not seem to be a problem for command-line stitchr
... what is the correct way to run stitchr
with no C region information using the approach above with run_stitchr.py
?
Many thanks!
Daniel