Giter Site home page Giter Site logo

missuse / ragp Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 4.0 7.44 MB

Filter plant hydroxyproline rich glycoproteins

License: MIT License

R 100.00%
hydroxyproline-rich-glycoproteins arabinogalactan-protein-sequences hydroxyproline-prediction signalp targetp phobius hmmscan

ragp's Introduction

Hi there ๐Ÿ‘‹

missuse's GitHub stats

  • ๐Ÿ”ญ Iโ€™m currently (for a while now) looking at biological sequences in various ways.
  • ๐ŸŒฑ Iโ€™m currently learning SAS.

ragp's People

Contributors

missuse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

ragp's Issues

supporting X, B and Z in predict hyp input

predict_hyp could support unknown amino acids symbols X (any), B (D or N) and Z (E or Q) if they occur up to once or even twice in subsequence.

This could proceed by generating all possible subsequences (this is why a limit of X is needed 20 * 20 is manageable but 20 * 20 * 20 is too much per subsequence), predicting Hyp probability in them and retuning the probability that is the min, max or floor(median) of predictions based on an additional argument in the function.

AAstringset inputs

Ideally, the functions get_phobius_file, get_signalp_file, and get_targetp_file should be a bit more flexible on input (ie accepting AAstringset format).

handling stop codons

Functions should either tolerate stop codons or strip them out. Currently predict_hyp gives the error:

Warning message:
In FUN(X[[i]], ...) :
  Characters other than single letter code for amino acids are present

scan_ext and scan_prp

As an analogy to scan_ag two functions for detection of ext and prp motifs should be added.
Mostly because of plot_prot which would benefit from such infromation.
Will issue a PR when I finish.

scan_bias to scan locally biased amino acid composition

Given a sliding window (which can be changed) find regions with biased amino acid composition. What is a biased composition should be defined by an argument.

Need to think about an efficient implementation. Perhaps this is already implemented.

scan_ag output

The scan_ag and predict_hyp outputs are really nice.

It would also be good if there was an output option with the same colnames as "get_hmm" to simply list the locations of the relevant Prolines.

progress bar for get_hmm and get_big_pi

get_hmm and get_big_pi can take a while. When submitting >1000 sequences, I've tended to do them individually in order to salvage the results if it becomes unresponsive after an hour. I don't know if tghe server prefers batch queries rather than repeated individual queries, but even so, sequences could be submitted in batches of 10 or 50 to give an idea of estimated time.

annotx <- NULL
pbt    <- txtProgressBar(min = 0, max = length(sequences), style = 3)
pbw    <- winProgressBar(min = 0, max = length(sequences), title = "HMM progress")

for(i in 1:length(sequences)){
    seqsubset <- sequences[i]
    annotx <- rbind(annotx, ragp::get_hmm(sequence = seqsubset,
                                          id = names(seqsubset),
                                          verbose = FALSE,
                                          sleep = 0))
    setTxtProgressBar(pbt, i)
    setWinProgressBar(pbw, i, title= paste("HMM progress:",
                                           round(i/length(sequences)*100, 0),
                                           "%      (",
                                           names(sequences[i]),
                                           ")"
  ))
}
close(pbw)
annotx

Several functions no longer working

Hello,
Thank you for providing these tools, they are very useful. All these tools were working several months ago (apart from get_netGPI but I see you've fixed this now...thanks!!) but unfortunately I can no longer get get_big_pi or get_phobius to work anymore. Any ideas what the problem might be?

For get_phobius I get this error:
Error in (grep("SEQENCE", res) + 1):(grep("_uacct", res) - 1) :

And for get_big_pi I get this error:
Error in strsplit(resulti, "Query sequence")[[1]] :
subscript out of bounds

I don't think it is me because it works for other 'get' functions and I'm following the vignette closely. Cheers in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.