jhhughes256 / optinterval Goto Github PK

Raw R code for manuscript: Hughes JH, Upton RN, Reuter SE, Phelps MA, Foster DJR. Optimising time samples for determining AUC of pharmacokinetic data using non-compartmental analysis.

R 30.71% HTML 69.29%

area-under-curve genetic-algorithm noncompartmental-analysis optimisation pharmacokinetics

optinterval's People

Contributors

Watchers

optinterval's Issues

incorrect theta used in optim.interv()

Issue Description

By definition:

a <= theta <= b

However, with the current layout of optim.interv() there are potential for situations where this is not true. This is due to theta being determined using the initial estimates and then the intervals being optimised.

It is possible that the final intervals use values that are either not between a and b or are less than theta.

Reproducible Example

Use sumexp_functions.R from: 87fe9a5
Stay tuned...

optim.sumexp() trying multiple initial parameters

Issue Description

Currently optim.sumexp() is designed to take one set of initial parameters which it determines depending on either a simple linear model (lm) or on final parameters from more complex models.

Enhancement

The function could try multiple sets of initial parameters allowing for a more thorough search of the surface to avoid selection of a local minima.

optim.interv output is not ordered

Issue Description

Current implementation of optim.interv (a.k.a optim.interv.rep) orders times used as initial parameters, however does not order times that come out of the optim function.

This means that unordered times could be given to the trapezoidal functions resulting in incorrect calculation of AUC.

Determining standard error on parameter estimates using hessian matrix

Issue Description

Currently there are numerous outliers being produced by the algorithm. This is likely due to a failure in the algorithm at either of the optim functions to determine parameters that are desirable.

To avoid this the hessian matrix provided by optim could be used to examine the standard error on parameters and determine if the fit provided is good enough!

optim.interv - Generate initial parameters using ga()

Issue Description

The current method of generating initial parameters uses repeated basic random sampling until arbitrary criteria are met. It has been shown in testing that final parameters for intervals are heavily dependent on initial parameters. When creating optim.sumexp it was found that issues with initial parameters could be overcome using ga() for initial parameter generation.

removal of dependency on argument nexp in optim.sumexp

Issue Description

The loop currently used for optim.sumexp runs until it reaches a fixed parameter entered by the user. This may result in an unnecessary increase in run-time or the loss of a better fit. It is acknowledged that the latter is not as bad though, as it is in place to prevent over-fitting.

Enhancement

By incorporating the optim.sumexp function with the quisq.sumexp function you could determine the number of tested exponentials as:

repeat {
  optim.sumexp.forloop()
  quisq.sumexp()
  if (quisq.result >= best.quisq.result) break
}

Can the precision of the algorithm be improved?

Issue Description

There is a large amount of variability in the precision of the algorithm. Old algorithm would fail ~50% of the time according to testing.

Reproducible Example

Outlined in study07_broad_test.R.

optim() within optim.sumexp() using non-finite values

Issue Description

When running optim.sumexp() with some datasets, non-finite values are fed to optim() resulting in the following error:

Error in optim(init.par, mle.sumexp, method = "L-BFGS-B", lower = rep(-Inf, :L-BFGS-B needs finite values of 'fn'

Reproducible Example

Use sumexp_functions.R from: 87fe9a5

time.samp <- seq(0, 48, by = 1)
threedata <- data.frame(
  time = time.samp, 
  line1 = -1*time.samp + 7, 
  line2 = -0.2*time.samp + 5, 
  line3 = -0.01*time.samp + 3
)
threedata$sumexp <- exp(threedata$line1) + exp(threedata$line2) + exp(threedata$line3)
data <- data.frame(time = time.samp, conc = threedata$sumexp)
optim.sumexp(data, oral = F, nexp = 4)

GA error; if (object@run >= run) break : missing value where TRUE/FALSE needed

Issue Description

When running through particular sets of data the GA function can produce NAs that results in error.
This should be isolated using try() or other methods and contingency functions put in place in case of error occurence.

Reproducible Example

So far seems to occur most commonly when running study03_broad.R for the d2a and d3a datasets. The error itself must be located within the optim.sumexp function as ga() is used nowhere else.

What about linear-up log-down?

Issue Description

The trapezoidal error determined here assumes the trapezoidal integration is linear-up and -down (lin), however the most predominant method for trapezoidal integration is linear-up log-down (linlog).

When using lin more error is present in the latter half of the curve than seen in linlog. Final intervals would be minimising the error for lin, but for linlog there would be less error than accounted for in the latter half of the curve, and the same at the beginning, causing the early points to have a relatively higher error.

Question

Is it possible to minimise trapezoidal error for the case of using linear-up log-down?

Outlier: prop > 3

Information

Metric: tmax
Data: d1a
Status: Finished - Poor sumexp fitting #10

Verbose Output Example

id data metric ref test type     ref.m1   ref.m2 ref.m3 ref.m4   ref.c1 ref.c2 ref.c3 test.nexp    test.m1   test.m2 test.m3 test.m4  test.c1 test.c2 test.c3 t2.1 t2.2 t2.3 t2.4 t2.5 t2.6 t2.7 t2.8 t2.9 t3.1 t3.2 t3.3 t3.4 t3.5 t3.6 t3.7 t3.8 t3.9     prop outlier  bioq
44  d1a   tmax 0.7    3  opt -0.9857393 -1.92927     NA     NA 3.445674     NA     NA         2 -0.9830524 -28.50164      NA      NA 3.374213      NA      NA    0  0.1  0.2  2.1  3.6  5.9 10.4 11.5   24    0  0.1  0.2  1.4  2.4  3.8  5.9  9.9   24 4.285714    TRUE FALSE

Reproducible Example

source(paste(git.dir, reponame, "outliers/outlier_functions.R", sep = "/"))
ref.par <- c(-0.9857393, -1.92927, 3.445674)
fit.par <- c(-0.9830524, -28.50164, 3.374213)
t2.int <- c(0, 0.1, 0.2, 2.1, 3.6, 5.9, 10.4, 11.5, 24)
t3.int <- c(0, 0.1, 0.2, 1.4, 2.4, 3.8, 5.9, 9.9, 24)

Basic Analysis

Check fitted sum of exponentials, they are quite bad

conc.data <- matrix(c(pred.d1a(time.samp, ref.par), pred.d1a(time.samp, fit.par)), ncol = 2)
plot.sumexp.overlay(conc.data, time.samp, 2, -1)

Check intervals, they match well to the fitted parameters, but not the reference.

plot.sumexp.facet(conc.data, time.samp, 2, t2.int)
plot.sumexp.facet(conc.data, time.samp, 2, t3.int)

Checking new optim.sumexp

source(paste(git.dir, reponame, "fn_diag/fix_functions.R", sep = "/"))
cdf <- data.frame(time.samp, conc.data[,1])[(1:9)*30-29,]
chisq.sumexp(optim.sumexp(cdf, oral = T))$par

Unsure why the popsize run didn't work, considering that running the script manually says it should have worked! EDIT (17/29/6): Error in broad_analysis not accounting for number of iterations

Fitted Exponentials Resulting in Negative Values

Issue Description

When using the function pred.sumexp() it is possible to achieve negative concentrations. This only occurs with absorption curves.
Found when observing the results of d1a_broad_verbose.rds.

Reproducible Example

time.samp <- seq(0, 48, by = 0.1)
pred.d2a <- function(x, p) {
  exp(p[1]*x + p[4]) + exp(p[2]*x + p[5]) - exp(p[3]*x + log(sum(exp(p[4]), exp(p[5]))))
}
d2a.p <- c(-0.8, -0.01, -0.4, log(exp(4)*0.8), log(exp(4)*0.2))
d2a <- data.frame(
  time = time.samp,
  conc = pred.d2a(time.samp, d2a.p)
)
with(d2a, plot(time, conc))

Source of Issue

This issue is possible due to no control over whether the rate constant of the negative exponential is the largest rate constant. A positive exponential rate constant cannot have the largest rate constant, as this would no longer make it a rate-limiting step.

Greater precision with optim.interv

Goal

To improve the precision in which optim.interv determines its parameters.

Description

As seen in #7 there is a lack of precision of the algorithm. This could be due to poorly chosen intervals by optim.interv.

Ideas

Greater precision might be achieved with following:

Using the tmax to influence initial parameter generation
- Investigated in #9
- Problematic results, some intervals skipping the tmax entirely
- Not included in algorithm
Use a repeat loop to test different initial parameters
- Investigated in #9
- Included in algorithm
Ordering final parameters received from optim.interv
- Investigated in issue #16
- Included in algorithm
Using the ga() function to generate initial parameters
- Open issue #18
- Has replaced the use of repeat loop
Incorporation of lin-log trapezoidal rule
- Open issue #4
Use hessian matrix to determine standard error on parameter estimates
- Investigated in #17
- See .rmd provided by Richard Upton for explanation of what needs to be done
- This could help in determining useful error messages
- May be able to address some of the setbacks found in #18

Greater precision with optim.sumexp

Goal

To improve the precision in which optim.sumexp determines its parameters.

Description

As seen in #7 there is a lack of precision of the algorithm. This could be due to poorly fitted curves by optim.sumexp resulting in poor optimisation by optim.interv.

Ideas

Greater precision might be achieved with following:

Use repeat on optimisation to reach a better objective value for final parameters
- Investigated in #7
- Resulted in great improvements in "successful optimisations".
- Showed little benefit to outliers
Allow for disordered parameters to be used with optimisation
- Investigated in #8
- Removed possibility of negative concentrations, removed outliers
- Incorporated into algorithm
Use popSize = x to replicate findings from using repeat on optimisation
- Investigated in #14
- Overall improvement in algorithm precision and reduction in outliers
- Concerns around worse results in some cases
- Incorporated into algorithm (addition and removal might need testing with new algorithm components)
Use hessian matrix to determine standard error on parameter estimates
- Investigated #17
- See .rmd provided by Richard Upton for explanation of what needs to be done
- This could help in determining useful error messages
- Currently searching for threshold for standard error #19

Can a threshold on standard error be used to improve fits?

Issue Description

Standard errors derived from hessian matrices of optim function output could be used to assess confidence in that output.

Hessian matrices could be included in final output of both broad and narrow study designs. This would then allow assessment of standard errors for the best and worst fits.

Using delta-time in place of time for err.interv

Issue Description

Early in the conception of err.interv a decision was made between using time or delta-time as a parameter. There have been many issues with optim.interv that have led to testing for a variety of solutions (#9, #16, #18). It may be beneficial for delta-time to be trialed to see if this removes some of these problems.

Important details for implementation:

Minimum for the parameter should be 0.5
Maximum would be 23.5 - 0.5*n where n is the parameter number
- this would be implemented in ga() but not optim()
L-BFGS-B and BFGS optim() methods should be trialled
- the prior may be unstable, the latter may provide values <0.5

Outlier: prop < 0.5

Information

Metric: auc
Data: d1a
Status: Thorough analysis may be required...

Verbose Output Example

id data metric    ref      test type     ref.m1    ref.m2 ref.m3 ref.m4   ref.c1 ref.c2 ref.c3 test.nexp    test.m1   test.m2 test.m3 test.m4  test.c1 test.c2 test.c3 t2.1 t2.2 t2.3 t2.4 t2.5 t2.6 t2.7 t2.8 t2.9 t3.1 t3.2 t3.3 t3.4 t3.5 t3.6 t3.7 t3.8 t3.9      prop outlier  bioq
13  d1a    auc 1.6508 0.5844287  opt -0.9338578 -1.768985     NA     NA 1.183406     NA     NA         2 -0.9315886 -1.692559      NA      NA 1.161486      NA      NA    0  0.4  0.9  1.6  3.3  4.8    7   11   24    0  0.8  1.4  2.6  3.8  5.2  7.4 11.3   24 0.3540275    TRUE FALSE

Reproducible Example

source(paste(git.dir, reponame, "outliers/outlier_functions.R", sep = "/"))
ref.par <- c(-0.9338578, -1.768985 ,1.183406)
fit.par <- c(-0.9315886, -1.692559, 1.161486)
t2.int <- c(0, 0.4, 0.9, 1.6, 3.3, 4.8, 7, 11, 24)
t3.int <- c(0, 0.8, 1.4, 2.6, 3.8, 5.2, 7.4, 11.3, 24)

Basic Analysis

Check fitted sum of exponentials, they seem good.

conc.data <- matrix(c(pred.d1a(time.samp, ref.par), pred.d1a(time.samp, fit.par)), ncol = 2)
plot.sumexp.overlay(conc.data, time.samp, 2, -1)

Check intervals, they seem good too.

plot.sumexp.facet(conc.data, time.samp, 2, t2.int)
plot.sumexp.facet(conc.data, time.samp, 2, t3.int)

Unsure what's going wrong here...

Perfecting the boundaries on study_rdata

Goal

To highlight issues with study_rdata so that the broad study design works well.

Description

As seen in #7 there is a lack of precision of the algorithm. This could be due to the randomised curves produced by study_rdata being unrealistic.

This will require a case by case analysis of outliers, to determine if the random data is at fault, or if its the algorithm at fault.

Optim accepting disordered intervals as best solution

Issue Description

optim.interv is accepting disordered parameters as best solution. This is very unusual, as trapezoidal error is likely to be larger in a disordered set of parameters.

Reproducible Example

t1 <- seq(0, 24, length.out = 9)
p <- c(-0.8243497, -0.8205869, -1.2830615, 3.6208785, 4.5177880)
optim.interv(t1, p)

Potential Source of Issue

This may be due to optim wanting to double dip on the lowest second derivative of the sum of exponentials.

t1 <- seq(0, 24, length.out = 9)
p <- c(-0.8243497, -0.8205869, -1.2830615, 3.6208785, 4.5177880)
time.samp <- seq(0, 24, by = 0.1)
dv <- pred.sumexp(p, time.samp)
plot(time.samp, dv)
abline(v = optim.interv(t1, p)$par)

Increased popSize may replicate repeat optimisations

Issue Description

In #7 a repeat function was used to improve the "successful" optimisations of optim.sumexp. It was hypothesised that this was due to a theoretical increase in mutation probability, however the maths showed this was not the case. Additionally the method did not appear to have an impact on outliers.

The reason for increase in success may have been replicated by simply increasing the popSize argument in the ga() function. Repeating 5 genetic algorithms is the same as having the popSize multiplied by 5.

The potential reason for the change not impacting outliers may have been due to the presence of other more pressing issues (#8, #9).

Early Testing

Early testing shows a 99.7% chance of having final sumexp parameters with an arbitrarily low objective value. Running new broad study design with popSize = 1250 will see if there is impact on outliers.

jhhughes256 / optinterval Goto Github PK

optinterval's People

Contributors

Watchers

optinterval's Issues

Issue Description

Reproducible Example

Issue Description

Enhancement

Issue Description

Issue Description

Issue Description

Issue Description

Enhancement

Issue Description

Reproducible Example

Issue Description

Reproducible Example

Issue Description

Reproducible Example

Issue Description

Question

Information

Verbose Output Example

Reproducible Example

Basic Analysis

Checking new optim.sumexp

Issue Description

Reproducible Example

Source of Issue

Goal

Description

Ideas

Goal

Description

Ideas

Issue Description

Issue Description

Information

Verbose Output Example

Reproducible Example

Basic Analysis

Goal

Description

Issue Description

Reproducible Example

Potential Source of Issue

Issue Description

Early Testing

Recommend Projects

Recommend Topics

Recommend Org