Giter Site home page Giter Site logo

agartland / pwseqdist Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 2.0 72 KB

A small package that efficiently computes distances between genetic sequences. Can accommodate similarity matrices, sequences of different lengths and custom metrics.

License: MIT License

Python 100.00%

pwseqdist's People

Contributors

agartland avatar kmayerb avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pwseqdist's Issues

If User submits an unrecognized character Numba does not allow seq2mat exception handling!

If the user does not prefilter CDR3:

pw.apply_pairwise_sq(seqs = ['CAS_GAF', 'CASRGAF'],
					 metric = pw.metrics.nb_editdistance, 
					 ncpus    = 1, 
					 use_numba= True)
~/TCRDIST/pwseqdist/pwseqdist/matrices.py in seqs2mat(seqs, alphabet, max_len)
    179             try:
    180                 mat[si, aai] = alphabet.index(s[aai])
--> 181             except ValueError('Unknown symbols given value for last column/row of matrix'):
    182                 """Unknown symbols given value for last column/row of matrix"""
    183                 mat[si, aai] = len(alphabet)

TypeError: catching classes that do not inherit from BaseException is not allowed

compared with standard behavior for recognizing unknown characters by non-Numba metric:

In [44]:
    ...: import Levenshtein
    ...: pw.apply_pairwise_sq(seqs = ['CAS_GAF', 'CASRGAF'],
    ...: ^I^I^I^I^I metric = Levenshtein.distance,
    ...: ^I^I^I^I^I ncpus    = 1,
    ...: ^I^I^I^I^I use_numba= False)
Out[44]: array([1])

In [45]: import Levenshtein
    ...: pw.apply_pairwise_sq(seqs = ['CAS8GAF', 'CASRGAF'],
    ...: ^I^I^I^I^I metric = Levenshtein.distance,
    ...: ^I^I^I^I^I ncpus    = 1,
    ...: ^I^I^I^I^I use_numba= False)
Out[45]: array([1])

tcr_dict_distance_matrix

Hello,

Thanks for the awesome package! However I do have a question about tcr_dict_distance_matrix on line 59 in matrices.py. How did you come up with the numbers for each amino acid pair? I tried to convert blosum62 matrix to distance matrix using kernel2dist function, but the result was not even close to those numbers in tcr_dict_distance_matrix.
Thanks so much in advance!

Replace multiprocessing with pathos

Currently we use python's built-in multiprocessing to spread computations across multiple CPUs. This requires that the metric function be globally importable, which is a limitation of pickle, which is used by multiprocessing. The package pathos offers a essentially drop-in replacement for multiprocessing which uses dill to serialize the objects that are passed to worker processes, which would allow for locally defined and more complex functions to be used as metrics. Currently this is not a major limitation, but if this issue crops up again in the future I think this would be an easy update to make.

Links with info:
https://github.com/uqfoundation/pathos
https://medium.com/@emlynoregan/serialising-all-the-functions-in-python-cd880a63b591

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.