lotze / compoissonreg Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 4.0 1.12 MB

COMPoissonReg R package

License: GNU General Public License v2.0

R 68.83% C++ 22.09% C 5.71% TeX 3.37%

compoissonreg's People

Contributors

Stargazers

Watchers

Forkers

noamross andrewraim hskksk animucki

compoissonreg's Issues

More digits of accuracy in Euler–Mascheroni constant

The C++ function cmp_allprobs needsh more digits of accuracy in its Euler–Mascheroni constant:
0.57721566490153286060651209008240243104215933593992

Remove constantCMPfitsandresids

Remove constantCMPfitsandresids and other functions that may no longer be in use.

A bug in vcmp

An issue was reported with the newly-exposed vcmp function in version v0.7.1 of COMPoissonReg.

library(COMPoissonReg)

## parameter 1
lam = 0.6785115
nu = 0.007096562

# Compare vmp to calculation by truncated sums
vcmp(lam, nu)
var(rcmp(100000, lam, nu))

## parameter 2
lam = 1.199938
nu = 0.4124449
vcmp(lam, nu)

# Compare vmp to calculation by truncated sums
vcmp(lam, nu)
var(rcmp(100000, lam, nu))

The vcmp results for the two given cases are -985.2381 and 912.5801, which are way off the empirical variances (and obviously a negative variance result isn't right). The underlying issue appears to be in the use of the grad.fwd function. It is not producing accurate second derivatives in these cases.

Here is a workaround that uses the numDeriv package, until we can fix the underlying problem in vcmp.

vcmp_alt = function(lambda, nu) 
{
	# Only handle univariate inputs for now
	stopifnot(length(lambda) == 1)
	stopifnot(length(nu) == 1)

	# Make sure these options are set
	hybrid_tol = getOption("COMPoissonReg.hybrid_tol")
	truncate_tol = getOption("COMPoissonReg.truncate_tol")
	if (is.null(hybrid_tol)) {
		options(COMPoissonReg.hybrid_tol = 1e-2)
	}
	if (is.null(truncate_tol)) {
		options(COMPoissonReg.truncate_tol = 1e-6)
	}

	# Variance calculation using derivatives
	ev = ecmp(lambda, nu)
	dd = lambda^2 * numDeriv::hessian(ncmp, lambda, nu = nu, log = TRUE)
	as.numeric(dd + ev)
}

Here is a quick test of vcmp_alt on the inputs above.

library(numDeriv)

lam = 0.6785115
nu = 0.007096562
vcmp_alt(lam, nu)
var(rcmp(100000, lam, nu))

lam = 1.199938
nu = 0.4124449
vcmp_alt(lam, nu)
var(rcmp(100000, lam, nu))

Check nu > 0

The C++ function cmp_allprobs needs to check that nu > 0

Calling dcmp with nu < 0 (a silly mistake I made) soaked up all the cpu time on my machine, swapped for a long time and eventually killed the process.

I need to be protected from myself.

formula.nu processing

There is an issue with formula processing. Here is a small example to illustrate.

y <- rcmp(250, lambda = 10, nu = 0.95)

# Doesn't work
out <- glm.cmp(y ~ 1)
out <- glm.cmp(y ~ 1, formula.nu = ~ 1)

# Workaround
out <- glm.cmp(y ~ 1, formula.nu = y ~ 1)

In the first two cases, The nu formula is missing the length of the data.

coef(mod, what="lambda") does not return desired value

The call: coef(mod, what="lambda") does not return desired value.

I am using the code exp(mod$beta), and would like to use the 'standard' approach via coef.

Use X:, S: and W: prefixes for covariate names everywhere

Currently we only do it when printing the fit.

z-function evaluating to infinity

Moved this out of a random TODO file. Need to see if any of this is still relevant.

We currently don't throw errors if the z-function evaluates to infinity. This causes some weird results, like the parameters lambda = exp(5.25), nu = 0.4 always causing rcmp to draw 203 as the value.

Add ymax as an option (not global) to regression functions

There is an issue with formula processing. Here is a small example to
illustrate.

y <- rcmp(250, lambda = 10, nu = 0.95)
# Doesn't work
out <- glm.cmp(y ~ 1)
out <- glm.cmp(y ~ 1, formula.nu = ~ 1)
# Workaround
out <- glm.cmp(y ~ 1, formula.nu = y ~ 1)

Issue with offsets in ZICMP predict

The ZICMP predict function initially had a mistake when support for offsets was added. This mistake is included in version v0.7.0, but has subsequently been fixed in the master branch.

Here are steps to reproduce the issue.

data(couple)
zicmp.out = glm.cmp(UPB ~ EDUCATION + ANXIETY, formula.nu = ~ 1,
	formula.p = ~ EDUCATION + ANXIETY, data=couple)
print(zicmp.out)

new.data = data.frame(EDUCATION = round(1:20 / 20), ANXIETY = seq(-3,3,
	length.out = 20))
predict(zicmp.out, newdata=new.data)

This results in the output

Error in fitted.zicmp.internal(X, S, W, object$beta, object$gamma, object$zeta,  : 
  dims [product 20] do not match the length of object [387]
In addition: Warning message:
In X %*% beta + off.x :
  longer object length is not a multiple of shorter object length
Called from: fitted.zicmp.internal(X, S, W, object$beta, object$gamma, object$zeta, 
    object$off.x, object$off.s, object$off.w)

Until the corrected code gets released, here is a workaround for the typical case when the user has not opted to use offsets.

# Manually set the saved offsets in zicmp.out to be 0 to be compatible with the data in new.data
zicmp.out$off.x = 0
zicmp.out$off.s = 0
zicmp.out$off.w = 0
predict(zicmp.out, newdata = new.data)

If offsets are required, they cannot be specified to predict via newdata in v0.7.0. Code to work around this is more involved. If anyone urgently needs this, let me know.

Error in UseMethod("nu")

Hello, I was trying to fit under dispersed count data using "nu" function but got an error " no applicable method for 'nu' applied to an object of class "c('glm', 'lm')"". please suggest me how to solve the issue. Thank you

Simple fitting?

Can this package be used to do basic parameter estimation for a Conway-Maxwell-Poisson distribution?
I see lots of packages are doing something quite a bit fancier than this.
Something like the following function signature:

x = c(1, 2, 3)
y = c(0.9, 0.05, 0.05)
result = fit(x, y)

result["lambda"]
result["nu"]
...

Numerical issues in couple example?

Are optimization + Hessian issues causing the SEs to be unstable, or is there a mistake in the code somewhere?

glm.cmp: numerical problems with large magnitude covariates

We noticed problems with large magnitude covariates when nu < 1. (For example, even when nu is about 0.5 )

With L-BFGS-B, the optimizer fails after reporting infinite values.
With Nelder-Mead or BFGS, we get results with NaN standard errors.

Is this caused by very large changes in the normalizing constant when relatively small changes are made to beta? Could this be improved by changing to the mean-parameterization of CMP?

TODO: Set up a minimal working example using generated data.

Some functions fail if package is not loaded

The following issue was reported by a user when attempting to use rcmp and rzicmp before the package is loaded. It looks like a C++ vector is empty...

R> COMPoissonReg::rcmp(n=10, lambda=1, nu = 1.5)
Error in rcmp_cpp(n, prep$lambda, prep$nu, ymax = ymax) : 
  Expecting a single value: [extent=0].

R> COMPoissonReg::rzicmp(n=10, lambda=1, nu = 1.5, p = 0.5)
Error in rcmp_cpp(n, prep$lambda, prep$nu, ymax = ymax) : 
  Expecting a single value: [extent=0].