Giter Site home page Giter Site logo

nunofonseca / msi Goto Github PK

View Code? Open in Web Editor NEW
4.0 3.0 1.0 5.96 MB

License: GNU General Public License v3.0

Perl 3.66% Shell 87.68% R 7.97% Dockerfile 0.69%
nanopore-analysis-pipeline metabarcoding adapter-trimming docker-image primer polishing binning

msi's Issues

R version question

For R, is 3.5+ required or does 4.1 work? The 4.1 trips the "ERROR: R version should be 3.5 or above" when the .1 is checked.

lca working well

Hi @nunofonseca

I compared the robitools lca with yours. Identical results. I tested 4 large vectors of between 1M and 4M rows. Using aggregate by qseqid$path or similar gives a single bin for each query (still identical to robitools). Yours is much faster! I will use it in my functions.

The only minor issue I have is that I start with 7 levels always and wish to end with 7 levels also (for other functions). To give exactly the same output I was getting before I need to follow your function with the code below. Perhaps this could be somehow built into the function?

lcaspNuno$binpath[is.na(lcaspNuno$binpath)]<-"unknown;unknown;unknown;unknown;unknown;unknown;unknown"
lcaspNuno$binpath[stringr::str_count(lcaspNuno$binpath,";")==5]<-paste0(lcaspNuno$binpath[stringr::str_count(lcaspNuno$binpath,";")==5],";unknown")
lcaspNuno$binpath[stringr::str_count(lcaspNuno$binpath,";")==4]<-paste0(lcaspNuno$binpath[stringr::str_count(lcaspNuno$binpath,";")==4],";unknown;unknown")
lcaspNuno$binpath[stringr::str_count(lcaspNuno$binpath,";")==3]<-paste0(lcaspNuno$binpath[stringr::str_count(lcaspNuno$binpath,";")==3],";unknown;unknown;unknown")
lcaspNuno$binpath[stringr::str_count(lcaspNuno$binpath,";")==2]<-paste0(lcaspNuno$binpath[stringr::str_count(lcaspNuno$binpath,";")==2],";unknown;unknown;unknown;unknown")
lcaspNuno$binpath[stringr::str_count(lcaspNuno$binpath,";")==1]<-paste0(lcaspNuno$binpath[stringr::str_count(lcaspNuno$binpath,";")==1],";unknown;unknown;unknown;unknown;unknown")
lcaspNuno$binpath[stringr::str_count(lcaspNuno$binpath,";")==0]<-paste0(lcaspNuno$binpath[stringr::str_count(lcaspNuno$binpath,";")==0],";unknown;unknown;unknown;unknown;unknown;unknown")

The only other thing I noticed is that using remove.dups is much slower the way I use the function (see below, I tried it a couple of times). btabsp here is a blast result of 1 million hits, from c. 21k queries.

> t1<-Sys.time()
> lcaspNuno = aggregate(btabsp$path, by=list(btabsp$qseqid),function(x) lca(x,sep=";",remove.dups = T))
> t2<-Sys.time()
> t3<-round(difftime(t2,t1,units = "mins"),digits = 2)
> t3
Time difference of 5.63 mins
> t1<-Sys.time()
> lcaspNuno = aggregate(btabsp$path, by=list(btabsp$qseqid),function(x) lca(x,sep=";"))
> t2<-Sys.time()
> t3<-round(difftime(t2,t1,units = "mins"),digits = 2)
> t3
Time difference of 2.13 mins
> 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.