Giter Site home page Giter Site logo

preseqr's People

Contributors

andrewdavidsmith avatar terencewtli avatar timydaley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

preseqr's Issues

Use of underscore characters in R?

I think this is strongly discouraged, because in some (all??) versions of R, the underscore character can be interpreted as assignment. The typical replacement is to use a dot ('.'). Do you agree @timydaley ?

Outputs of preseqR.rSAC and preseqR.rSAC.bootstrap

Maybe a stupid idea. Since preseqR.rSAC.bootstrap is bootstrap version of preseqR.rSAC, I expected the returned object to behave similarly. Now estimator returned by preseqR.rSAC is a function and that returned by preseqR.rSAC.bootstrap is a list, you need to use f inside to take the value of r

Use of os functions

When working with paths, for example joining paths, use the "os" stuff. Like this:

import os
fullPath = os.path.join(srcPath, cppPath)

and make sure not to use the stuff in shutil when more standard Python is easily used instead. Also, in Python it is typical to use camel-case, and to start variables with lowercase.

For replacing the "hpp" with "h" the best way might involve a simple regex replace.

NA/Inf replaced by maximum positive value

When using any functions associated with the Fisher estimator (fisher.alpha, fisher.rSAC), I get these warnings:

1: In uniroot(function(x) (exp(x) - 1)/x - N/S, c(0.001,  ... :
  NA/Inf replaced by maximum positive value
2: In uniroot(function(x) (exp(x) - 1)/x - N/S, c(0.001,  ... :
  NA/Inf replaced by maximum positive value
...```

Tutorial in README.md

I think it's might be good to add a short tutorial on how to run the example dataset in README so that user can easily try out the example dataset after installation.

Design seems completely wrong

It is not clear why so much existing code is duplicated in these files. It does not make sense that the only way to include other code from smithlab_cpp and from preseq is to actually copy it into new files.

results from bootstrap.complexity.curve are quite different from preseq bootstrap

The speed of preseqR.continued.fraction.estimate and bootstrap.complexity.curve should no longer be an issue. The correctness of preseqR.continued.fraction.estimate is checked by comparing its results with preseq.

Implementation of bootstrap.complexity.curve is to resample histogram, run preseqR.continued.fraction.estimate and summarize all results to make estimation. However, results from this function are quite different from results in preseq, which causes suspicious. @timydaley could you check the code of bootstrap.complexity.curve?

Repo organization

I think some of the files are needing removal or revision, and also there should be no nested preseqR inside preseqR.

functions in R package are very slow

Given the same histogram plus same parameters to estimate continued fraction, the c++ function costs 0.27 sys time, while the R function cost 4.461 sys time. The difference is more than 20 times. It becomes intolerant when I try to use bootstrap to estimate the complexity curve as the time consuming are multiplied by the times of resampling.

Release an updated version of publication's supplementary data

Would it be possible to add a tutorial (requested in Issue #21 as well) in the form of the original publication's supplementary data https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4885658/ ? Since there is no usage information and no vignettes, we are stuck with using version 1.2.1. Looking through the changelog it looks like lots of good things have happened since then, and I'd love to be able to use version 4.0 but am completely in the dark about which commands to use...

avoid stdout, stderr in c++

"Compiled code should not write to stdout or stderr and C++ and Fortran I/O should not be used. As with the previous item such calls may come from external software and may never be called, but package authors are often mistaken about that." -- From CRAN.

If we follow the instruction, I suggest to copy a continued_fraction.cpp from preseq, remove all I/O functions and use the file as a part of preseqR. But this will generate duplicate codes.

preseqR.rSAC does not give the same result as either the ds or ztnb algorithms

As I understand it, this function should pick one of ds or ztnb, whichever fits best. The problem is that if it selects ztnb, it doesn't return the same as ztnb. There is an obvious bug in the code that gives this effect, see below. Perhaps this is meant to be that way?

The functions differ because of a suspected bad variable assignment:

preseqR.rSAC <- function(n, r=1, mt=20, size=SIZE.INIT, mu=MU.INIT)
{
para <- preseqR.ztnb.em(n) ##this call is also different, all params are not sent in
shape <- para$size ##### HERE IT STARTS, THIS IS NOT ASSIGNED TO size
mu <- para$mu

the population is heterogeneous

because the coefficient of variation is large $1 / sqrt(shape)$

if (shape <= 1) {
f.rSAC <- ds.rSAC(n=n, r=r, mt=mt)
} else {
## the population is close to be homogeneous
## the ZTNB approach is applied

## the probability of a species observed in the initial sample
p <- 1 - dnbinom(0, size = size, mu = mu) ########HERE size is used
## L is the estimated number of species in total
L <- sum(as.numeric(n[, 2])) / p
## ZTNB estimator
f.rSAC <- function(t) {
  L * pnbinom(r - 1, size=size, mu=mu*t, lower.tail=FALSE) ########HERE size is used
}

}
return(f.rSAC)
}

extreme input for preseqR.sample.cov

I tried one extreme case I think nobody actually needs. When I use r=500, the probability I get for 10 is -0.8972191, it should not be negative, right?

"Recycling array of length 1 in vector-array arithmetic is deprecated" warnings

In cs.rSAC, I get the following warnings:

1: In f1 * t/f0 :
  Recycling array of length 1 in vector-array arithmetic is deprecated.
  Use c() or as.vector() instead.

2: In f0 * ppois(r - 1, f1 * t/f0) :
  Recycling array of length 1 in array-vector arithmetic is deprecated.
  Use c() or as.vector() instead.

3: In f0 * ppois(r - 1, f1 * t/f0) * exp(f1/f0) :
  Recycling array of length 1 in vector-array arithmetic is deprecated.
  Use c() or as.vector() instead.

4: In f0 + S - f0 * ppois(r - 1, f1 * t/f0) * exp(f1/f0) :
  Recycling array of length 1 in array-vector arithmetic is deprecated.
  Use c() or as.vector() instead.

Formatting code

Need to be fixed:

-- Lines longer than 80 chars must be fixed
-- Comment starts need to be 3 # for not indented (e.g. for function definitions), 2 # for indented, and 1 # for right justified.
-- Put a blank line between appropriate code blocks, and put 2 between functions. Any time a comment is put in the code above some source, please include a whitespace line above that comment.
-- Put space around " - " and " + " in mathematical operations, but not around "*" or "/". Use formatting that highlights order of evaluation.
-- Make sure there are no tabs, and every indent is 2 spaces. Also there should be no trailing whitespace.

Files and organization

I think some of the files are needing removal or revision, and also there should be no nested preseqR inside preseqR.

Use of "break" to exit a loop should be avoided

If at all possible, the conditions for loop exit should be collected into the statement of the loop invariant. This makes it much easier for a reader to know when and why a loop will either continue or stop. An example is the break statement in the loop of line 630 in miscount.R. The counter and upper.limit should be checked in the statement of the while condition along with the "bootstrap.times > 0"

preseqR not compatible with newest version of R

install.packages("/Downloads/preseqR_1.0.1.tar.gz")
Warning message:
package โ€˜
/Downloads/preseqR_1.0.1.tar.gzโ€™ is not available (for R version 3.1.1)

Your version of R is up to date

Appropriate way to do assignment in R

It seems to me like, even though R will now accept "=" as an assignment operator, it might be prudent and more consistent with most good R code, to use the original "<-" form of assignment. Does anyone have thoughts on this?

interpolation/extrapolation

I am not sure whether or not we should export the interpolation and extrapolation functions. All these functions predict the number of distinct items. It confuses people when they want to predict. The core function is preseqR.continued.fraction.estimate, which constructs a continued fraction and estimate the number of distinct items by combining results from interpolation and extrapolation. I suggest to hind interpolation and extrapolation from users and just export preseqR.continued.fraction.estimate.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.