idiv2: to do list

smooth out sSBR by averaging it over at a minimum 20 runs. Currently the sSBR looks very jagged which largely reflects the particulars of the random walk that it took in sampling. By doing this several times a smoother and a less biased estimate of the expected S can be obtained although it will be slower.
improve documentation for function rarefaction so it is clear what the different methods accomplish
add subset.mob_in function that aids in pulling out a reduced number of rows of the mob_in object - this can be really useful when wanting to only work with certain treatment combinations
acknowledge Marc Taylor for plotStacked function (https://github.com/marchtaylor/sinkr/blob/master/R/plotStacked.R) also thank Anne Chao and collegues for various functions --> this also requires us to copy and paste there functions for our purposes rather than maintain the dependancies on their package.
stacked bar chart with pie charts
implement spatial anaylsis when geographic coordinates are not available this changes interpretation from implicit and explicit spatial aggregation to just focusing on implicit spatial aggregation
get rid of plot_rarefy and move that code into plot.mob_out which currently calls that function.
implement the same log argument in plot.mob_out as in mob_out
remove the function that does the 3d plot (and breaks when the code is executed on a server)
change ScaleBy to ref_group
Anova type output based on a ref group
Agree on how to scale between # individuals and # plots
Regression type
- Regression analysis
- null tests for regression?
  - test 1 - complete individual randomization
  - test 2 -
  - test 3
  - sensitivity analysis
function at the beginning to check that rows (plots) between two data frames match
define object class mob_out (print, plot) and mob_in (print, t-test, summary -> t-test table)
- summary.mob_in should provide p-values rather than quantiles
other functions
- print
- plot
  - rarefaction plots these will pair with the delta S plots
    - rarefaction plot legend sometimes mislabels the curves (high priority)
- modify plot.mob_out to either plot delta curves or raw curves or both
  - It appears that plot.mob_out sometimes log transforms the x axis depending on mob_out$log_scale and sometimes does not. This needs to be made consistent across the figures. (high priority)
  - improve code in plot.mob_out
  - add number of samples to axis see old axis labels
  - nick requests that for the individual based rarefaction we show the entire curve across all samples and not just for the smallest number of individuals common across the groups.
- trim delta plots to ignore n < 5 individuals due to the artifact that it must interested at 0.
- box plots of univariate analyses
  - N vs trt
  - S rarefied vs trt
  - PIE rarefied vs trt
  - Beta PIE rarefied vs trt
  - S vs trt
- summary
cleanup and combine categorical case (anova) with continuous case (regression)
- break out null algos back into separate functions #30
test for composition? (*maybe later)
complex exp design with more than one factor? (*maybe later)
package up
figure out what's going on with null test for N
modify function rarefaction so it will accept a vector for sample based rarefaction
add extrapolated S to the boxplot
fix null test for aggregation
add output of number of permutations to mobr object output from get_delta_stats
occupancy data (low priority can only be applied to spatial rarefaction)
track down warning message from rarefaction for coffee and cattle datasets: "1: In rarefaction(comm, "indiv", inds) :" Also the ant 2014 dataset is producing the following error: "In deltaS_N(comm_perm[plot_levels == x, ], plot_dens_level, ... :
Extrapolating the rarefaction curve because the number of rescaled individuals is smaller than the inds argument "effort" larger than total number of samples
document package
- mobr.package documentation
note in the documentation of the function rarefaction that sample-based rarefaction is not actually used anywhere in the mob methods.
provide 3d plot option for headless machine (low priority) currently implemented with rgl
repeat sensitivity analysis for the discrete case after fixing the warnings & also implement with larger number of iterations

variable names in plot_attributes argument

Tiffany's empirical data uncovered several shortcomings of our function make_comm_obj. Specifically in how the plot attributes are handled. Specific problems are

if environmental variable is a single vector then it needs to be treated as a data.frame
the variable names of the plot attributes get scrambled up when splitting the env_var back out

Legend - plot_9_panels()

Hi,

Would be nice for the user to have a legend in the richness graphs produced by plot_9_panels(), indicating which color is assigned to trt_group (e.g. red) and which to ref_group (e.g. blue).

Best,
Valentin

Felix to-do list

Hi @FelixMay - here you go, our wishlist :) Thanks!

aggregating results on the gradient

Hey @rueuntal we discussed how dat_plot would be the input for a gradient method and would be identical to the pairwise case except it will now have additional columns for the continuous variables. Along these continuous variables we have several options for aggregating the rarefaction information.

use the group variable to identify unique "sites" along the gradient compute rarefaction for each site and then place it on the gradient according to that site's mean value along the gradient
compute all the unique values of the gradient variable and then compute rarefaction results at each of these unique places along the gradient.
bin the values of the gradient and then apply the method suggested in 2.

need to minimize class change delcarations

In future version of the code let's work hard to minimize the number of class change declarations that we make in the code (e.g., calling as.character()) these seem to be generally unneeded and make the code very cumbersome to work through. A better approach is simply to convert all factors to strings at the beginning of the analysis need any downstream calls to group variables will be character strings which generally avoid side-effects. This isn't so much a single change to the code but a change to our coding behavior. I'm done complaining now thank you ;)

effort > N in rarefaction

Right now it seems that rarefaction() throws a warning when effort > total abundance N and gives back S_obs. We've probably talked about this before - is there a reason why it doesn't return NA instead?

Need to think carefully about what the null models are

For the new 3-curve (or 4-curve) approach.

Implanting first pass on no w/in plot agg

here is the R code I was able to get working

comm_group_noagg = list()
for(i in seq_along(group_levels)) {
  comm_group = comm$comm[env_data == groups[i], ]
  sp_sums = colSums(comm_group)
  comm_group_noagg[[i]] = sapply(sp_sums, function(x) 
                            table(c(sample(1:nrow(comm_group), x,
                                           replace=T),
                                  1:nrow(comm_group))) - 1)
}

the object `comm_group_noagg is list with as many elements as groups. For each list element there is a community site by species matrix with identical species sums (i.e., colSums) to the orginal data but different rowSums. The abundances for each species are populated randomly across each plot within a treatment.

I think the easiest way to incorporate this into our existing code is to wrap all of get_delta_s in a for loop that at the beginning either calls this function and redefines comm or does not depending on if you want to turn off within plot aggregation signatures on the sample based test. I don't think any internal get_delta_S code would have to be changed. Just define that only the sample test is wanted. some holder variables will also need to be defined and averaged across. Sorry this solution is not more complete

group argument to get_delta_stats request (lower priority)

Allow argument to get_delta_stats that allows user to supply their own grouping variable rather than simply using the default behavior of using each unique value of the env_var. For example if the explanatory variable is latitude and some groups of sites have the same latitude then it would be nice if the user could provide group_id so that these same latitude plots would not be mixed in the sampled based analyses.

plot.mobr() and plot_9_panels() - wrong data frame name in "discrete" list of "mobr" object

Hi!

I worked today with a new dataset from Jon and I encountered some plotting errors with plot.mobr() and plot_9_panels() functions. I used the scripts from branch 4cur.
I got errors because some objects in the plotting functions could not be created because some object names changed in the structure of mobr object when most probably is created with the function get_delta_stats().

I went line by line with some test objects and it seems that in plot.mobr() one should make the following change in line 90:

tests = c('indiv', 'N', 'agg')

change to

tests = c('SAD', 'N', 'agg')

The error was triggered when trying to access the mobr object , for example, like mobr[[type]][[tests[i]]]. And actually the mobr object, created previously with get_delta_stats() function, has 3 data frames in the list discrete: SAD, N and agg, but no indiv. Therefore, I presumed I can change indiv with SAD.

Then a similar error happened in plot_9_panels() function. Here at lines 1200 and 1202 I replaced $ind with $SAD in order to avoid the error:

mobr$discrete$ind[, -1] = lapply(mobr$discrete$ind[, -1], function(x)
      as.numeric(as.character(x))) 
delta_Sind = mobr$discrete$ind[which(as.character(mobr$discrete$SAD$group) == as.character(trt_group)), ]

I changed to

mobr$discrete$SAD[, -1] = lapply(mobr$discrete$SAD[, -1], function(x)
      as.numeric(as.character(x))) 
delta_Sind = mobr$discrete$SAD[which(as.character(mobr$discrete$SAD$group) == as.character(trt_group)), ]

Please let me know if my suggestions make sense.
I didn't have time to check the function get_delta_stats(), but I presume that another option would be to make modifications there regarding the naming of the objects.

Best,
Valentin

SpadeR:::ChaoSpecies() error in boxplot function

This error was reported to me by @valentinitnelav his original posting is here

I simply copy and pasted the text below...

I tested the script with the new patch from yesterday and I get now a different error. The error happens in function SpecAbunAce {SpadeR}.

One can reproduce the error with the following datasets and code:
Datasets – download rda objects OK.rda and OK.env.rda

Code

load objects

load(file = "OK.rda")
load(file = "OK.env.rda")

source mobr.R and mobr_boxplots.R from local machine

source("C:/MoB/mobr-master/mobr-master/R/mobr.R")
source("C:/MoB/mobr-master/mobr-master/R/mobr_boxplots.R")

Make community object with mobr

OK_comm <- make_comm_obj(OK, OK.env)

Run function for summary statistics - getting error in SpecAbunAce {SpadeR}

OK_stats <- mob_stats(OK_comm, "treatment")

Warning: In this case, it can't estimate the variance of Homogeneous estimation
Error in if ((sum(x == 1)/sum(x[x <= k])) == 1) { :
missing value where TRUE/FALSE needed
The error seems to happen first time at row 9 in the OK object. There is only one species with 21 individuals.
Inside the SpecAbunAce{SpadeR} function the error is given by a division by zero in if ((sum(x == 1)/sum(x[x <= k])) == 1)
Note that the error is triggered inside mob_stats() when calling ChaoSpecies{SpadeR}:
S = ChaoSpecies(comm$comm[i,], datatype = "abundance")

Best,
Valentin

plot_samples error

The following code generates an error with a fresh install of our package

library(mobr)
data(inv_comm)
data(inv_plot_attr)
inv_mob_in = make_mob_in(inv_comm, inv_plot_attr)
stats = mob_stats(inv_mob_in)
plot_samples(stats)
Error in eval(expr, envir, enclos) : object 'S' not found
In addition: Warning message:
In (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
  row names were found from a short variable and have been discarded

Jagged spatial accumulation curve

In some empirical datasets at larger sampling efforts the spatial accumulation curve is a bit unstable and appears jagged. In a simple simulated community of species neatly arranged on a gradient sampled by a transect this jaggedness is even more noticeable. The moving-window nested species-area relationship does not show this pattern of jaggedness. The question is why - is this a bug (I don't think so) or just poor performance of our algo (I think this could be the case). Here is an example of what I'm talking about:

Note that the left panel shows the species (each a different color) and their position along the gradient. I added a little noise into their presences and the jaggedness relaxed a bit but it is still quite present relative to the SAR. I think this may be due to the fact that the SAR is an average over all possible accumulations of species where our spatial rarefaction curve is an average of a much smaller set that is nested within the set considered by the SAR. Any other ideas @rueuntal and @ngotelli ?

what are the best object names

hey @rueuntal now that the code is more or less stabilized let's spend a little bit of time thinking about what we have called things. The specific names I would like to discuss is comm. As I go through the code more I'll probably find other object names to add in to this thread.

With respect to comm. We have a function make_comm_obj which takes as its input an argument comm and another argument plot_attr. This is understandably confusing - why would a function that is supposed to make a comm take it as input. In #84 I changed the argument comm to x which is very generic but follows the convention of the package vegan which I think is a good authority for how to work with site-by-species matrices. This brings up a question though. Throughout the rest of our package we refer to comm like objects which are really just site-by-species matrices should we change these names to x? Is there a better name then x that we can think of. Maybe the simpliest solution is to change make_comm_obj to something else so like make_mob_data or something like that.

Let me know what you think.

boxplots the reference group should be to the left of the treatments

This is pretty minor but it typical that the control or reference group would be the leftmost boxplot rather than to the right of the treatment group. If folks agree I'll add this to #21 @rueuntal @FelixMay

mob output object attributes

If the output of get_delta_stats has an attribute called $type which is either discrete or continous is there any value to nesting the data.frames for the SAD, N, and agg effects within the attribute $discrete? So for example here is the current structure of the output of get_delta_stats:

Only the first five rows of any matrices are printed

$type
[1] "discrete"

$log_scale
[1] TRUE

$indiv_rare
  sample  invaded uninvaded
1      1 1.000000  1.000000
2      2 1.881711  1.607740
3      3 2.670418  2.052937
4      4 3.385225  2.426138
5      5 4.040640  2.763523
6      6 4.647746  3.079981

$sample_rare
     group sample_plot    impl_S expl_S deltaS_agg
10 invaded           1  6.737519   9.42 2.68248095
19 invaded           2 10.378372  12.10 1.72162788
29 invaded           3 13.595478  14.44 0.84452217
39 invaded           4 16.333422  16.94 0.60657842
49 invaded           5 18.753198  19.14 0.38680190
58 invaded           6 20.729451  20.76 0.03054906

$discrete
$discrete$SAD
    group effort_ind    deltaS_emp deltaS_null_low deltaS_null_median deltaS_null_high
1 invaded          1 -4.363176e-14   -6.065703e-14      -5.245804e-14    -4.180267e-14
2 invaded          2  2.739708e-01   -6.257883e-02      -7.606259e-03     4.085399e-02
3 invaded          3  6.174812e-01   -1.298121e-01      -1.663329e-02     8.763914e-02
4 invaded          4  9.590875e-01   -1.891738e-01      -2.517006e-02     1.311027e-01
5 invaded          5  1.277117e+00   -2.404404e-01      -3.283497e-02     1.700484e-01
6 invaded          6  1.567765e+00   -2.857248e-01      -3.971128e-02     2.053122e-01

$discrete$N
    group effort_sample   ddeltaS_emp ddeltaS_null_low ddeltaS_null_median ddeltaS_null_high
1 invaded             1  4.363176e-14    -8.903989e-14       -8.715251e-15      6.558365e-14
2 invaded             2 -5.611687e-01    -2.432205e-01       -9.175297e-02      4.615233e-02
3 invaded             3 -1.190207e+00    -5.589478e-01       -2.540712e-01      2.024506e-02
4 invaded             4 -1.815606e+00    -8.762102e-01       -4.132492e-01     -7.071931e-03
5 invaded             5 -2.415627e+00    -1.172861e+00       -5.484117e-01     -1.436326e-02
6 invaded             6 -2.986400e+00    -1.444538e+00       -6.567581e-01      2.155285e-03

$discrete$agg
    group effort_sample ddeltaS_emp ddeltaS_null_low ddeltaS_null_median ddeltaS_null_high
1 invaded             1    7.291876       -1.0816237           0.1018763          1.709876
2 invaded             2    9.290809       -0.4931914           0.5208086          1.650809
3 invaded             3    9.900593       -0.8054069           0.3905931          1.382593
4 invaded             4    9.859997       -0.9420026           0.4599974          1.324997
5 invaded             5    9.756374       -1.0381263           0.3063737          1.199374
6 invaded             6    9.038851       -1.2701486           0.2588514          1.458851

I propose that we get rid of the nesting of the effect specific dataframes within $discrete and I propose that we condense this down to a data.frame because effect data.frame has the same number of columns in the same order. Returning one data.frame is almost always preferable over a list of data.frames. We will just have to add a column for effect_type which will haves the values SAD, N, or agg.

4cur branch not up to date

Hey Xiao can you push the most recent of your 4curves branch to github thx

rarefied richness analysis for boxplots can fail

The rarefied richness analysis that @FelixMay implemented can fail when a treatment has a strong influence on abundance such that all the plots of a given treatment fall below the 10 individual cutoff (this occurred in the glade ants 2011 analysis). The analysis will return an error like this:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

which results because any sample with less than 10 individuals was dropped effectively making it a single treatment analysis. Although this is likely pretty rare we should add some code to catch this error and then remove the rarefaction analysis from the suite of tests.

analyses on presence absence data

Early in development we discussed the option of allowing presence absence data and we build that into the generation of a the input object but we didn't pursue this idea much beyond that. Assuming that is still a priority I thought here we could brainstorm the changes we need to for an analysis of binary data

Brainstorm / To Do List

boxplot functions should only target average richness and size of species pools
the effect of the SAD would become an analysis of the effect of the OFD (occupancy frequency distribution), the current approach to generating a treatment wide SAD cannot be generalized to the OFD because it relies on the Chao method of inferring the pool SAD - which I'm guessing would not apply to occupancy. The two curves that would be compared between the reference and treatment groups would be the random sample based rarefaction curves which contain within site autocorrelation. It is possible an analytical CI could be derived for the null model in this context (certainly we can do that for one curve - but maybe not for their differences?), but it seems like a randomization routine could be defined here as well.

null model for effect of N is not relevant in this case and can be ignored

the effect of between sample aggregation could be estimated using spatial sample rarefaction curves and comparing them to the null expected random sampling from both treatment and control.
change the output plot titles accordingly (e.g. SAD effect to OFT effect for example).

Warnings

Here are lists of exceptions and warnings for the analysis. @dmcglinn feel free to expand.

Warnings:

"Groups with less than min_plot plots will not be included in steps 2 & 3."
"Step 3 will not be performed without spatial information."
"Less than two groups has enough plots; steps 2 & 3 will not be performed."
"Reference group has less than min_plot plots. Only step 1 will be performed."

cleanup boxplot functions?

It would be nice if the three boxplot functions can be combined into one, which allows the users to specify whether it's sample, group or betaPIE.

rarefaction algos

Hey @rueuntal slogging through the code I see the following mapping for the rarefaction calculations when working with species richness (i.e., Hill number where q = 0):

rarefy_individual -> rareNMtests::rarefaction.individual -> vegan::rarefy
rarefy_sample_implicit -> rareNMtests::rarefaction.sample -> vegan::specaccum

Do you think we want to maintain the option of using our framework on other Hill numbers besides richness? rareNMtests gives us the option of running our code on q=1 (Shannon) and q=2 (Simpson) indices as well. If you don't think we'll likely ever use these then my vote is to directly code the analytical solutions rather than depend on rareNMtests and vegan for these calculations. Their functions lose some speed by carrying out redundant checks and in one case don't allow computing of the rarefaction curve for a sub sample of.all quadrats (I could request this change though).

Minimal # of plots per group

Should we specify a minimal # of plots for each group to carry out the plot-based (spatially-implicit & explicit) rarefactions? It probably doesn't make much sense to rarefy when there are only two or three plots.

get_delta_stats() bug

https://github.com/MoBiodiv/mobr/blob/master/R/mobr.R#L466 is causing an error

Error in colSums(x) : 'x' must be an array of at least two dimensions

not sure why yet

Formulation of PIE?

Hey @rueuntal take a look here:

https://github.com/MoBiodiv/mobr/blob/master/R/mobr.R#L256-L257

where you formulated PIE. I'm curious why the term N / (N - 1) is needed? You calculated the probability of a heterospecific (under an assumption of replacement) and then you multiply it by this.

Thanks!

OY labels in plot.mobr() function from branch 4cur

Hi @rueuntal !

I saw that some minor changes took place in the OY label names inside the plot.mobr() function.
Line 98 in mobr.R is now

ylabs = rep('delta-S', 3)

And previously it was

ylabs = c('delta-S', rep('delta-delta-S', 2))

Shouldn't the old line of code be kept? Otherwise is also not consistent with the output from plot_9_panels() function. But I'm not familiar with the entire concept so, please correct me if I'm wrong.

Best,
Valentin

Yes-no test to tease apart effect of aggregation and N

If N/aggregation differs - effect due to N/aggregation

If both differ - ???

Need to do sensitivity analysis on these scenarios.

Break the main analysis function into two for continuous vs discrete

Right now there are lots of

if (type == 'continuous'){}
else {}

in our code. I'm leaning towards breaking them into two separate functions, potentially suffering some repetition for better readability. @dmcglinn what do you think?

Nonspatial curve and ind-based curve not exactly identical

@dmcglinn @ngotelli This is kind of a nuance but I'd love to hear what folks think. If we think about the nonspatial curve as being created by randomly shuffling the individuals across plots, then the two curves are only exactly identical when all plots within the treatment have exactly the same abundance (density). The more "uneven" they are in abundance, the more divergence we'd see in the two types of calculations.

Consider the following extreme example - a treatment with three species and two plots, c(100, 100, 100) and c(1, 0, 0). No matter how we reshuffle the individuals, plot 1 would always have three species while plot 2 would always have 1. So the expected S at plot = 1 for the nonspatial curve is 2:

mean(c(rarefaction(c(101, 100, 100), 'indiv', c(1, 300))))

However, calculated from the ind-based curve, the expected abundance for plot = 1 is 150 or 151, and

rarefaction(c(101, 100, 100), 'indiv', 150)

is 3.

min, max, or mean

Need to get a better feeling for what is the best way (or if it depends) to scale plots to number of individuals in test 2.

log gamma warnings

Using the analytical rescaling of the rarefaction as introduced in #123 sometimes results in warnings about the input of lgamma being out of range.

I looked at a plot of the lgamma function and it shows this bizarre pattern.

So it is undefined for certain bands of negative numbers. Based on your derivation would you ever expect lgamma to be computed on a negative quantity. If not then I can setup an if statement so it is not evaluated for negative values at all.

Sample based random swapping algo is identical to individual rarefaction

The algorithm we devised to remove spatial within and between aggregation by shuffling individuals of each species randomly between plots of a given group type is identical to an appropriately rescaled individual based rarefaction for that same group. This simplifies our code in the function effect_of_N I will try to submit a modified version of this function soon that only relies on rescaled individual based rarefaction curves rather than any unnecessary sample based curves.

How to scale axes

One of the argument we allow users to specify when computing get_delta_stats is called log_scale which spaces the sampling of individuals evenly along a log scale. I propose that if the user sets this argument to TRUE that the downstream calls to our plotting routines use a log transformed x-axis when the number of individuals (not the number of samples) is the variable being displayed.

Alternatively and maybe more generally another solution could be for users to just specify in the plotting routine whether they want log transforms on numbers of individuals, samples, or both. We could also provide this option for the S axis but for our visual displays we may want to stick with arithmetic scaling of S because it adds emphasis to the fact that the patterns show strong scale dependence.

Error in mob_stats() when calls ChaoSpecies{SpadeR}

Hi @dmcglinn!

I tested the script with the new patch from yesterday and I get now a different error. The error happens in function SpecAbunAce {SpadeR}.

One can reproduce the error with the following datasets and code:
Datasets – download rda objects OK.rda and OK.env.rda

Code

# load objects
load(file = "OK.rda")
load(file = "OK.env.rda")

# source mobr.R and mobr_boxplots.R from local machine
source("C:/MoB/mobr-master/mobr-master/R/mobr.R")
source("C:/MoB/mobr-master/mobr-master/R/mobr_boxplots.R")

# Make community object with mobr
OK_comm <- make_comm_obj(OK, OK.env)

# Run function for summary statistics - getting error in SpecAbunAce {SpadeR}
OK_stats <- mob_stats(OK_comm, "treatment")

Warning: In this case, it can't estimate the variance of Homogeneous estimation 
 Error in if ((sum(x == 1)/sum(x[x <= k])) == 1) { : 
  missing value where TRUE/FALSE needed

The error seems to happen first time at row 9 in the OK object. There is only one species with 21 individuals.
Inside the SpecAbunAce{SpadeR} function the error is given by a division by zero in if ((sum(x == 1)/sum(x[x <= k])) == 1)
Note that the error is triggered inside mob_stats() when calling ChaoSpecies{SpadeR}:
S = ChaoSpecies(comm$comm[i,], datatype = "abundance")

Best,
Valentin

spatial rarefaction function miscalculating richness for n=1

It appears that our function rarefaction is miscalculating the richness when sample size is 1 (possibly at other sizes as well I'm not sure yet). Was this function ever compared to hand-worked examples as a test? We should get some explicit tests in place for this function. Here is code to demonstrate the bug:

# create a 3 species community distributed across 4 sites
comm = rbind(c(0, 0, 1), 
                       c(0, 0, 0),
                       c(0, 1, 0), 
                       c(1, 0, 0))
replicate(4, rarefaction(comm, 'spat', xy_coords = 1:4))
# note that the replicates differ from on another the correct value when 
# n = 1 is 0.75 (= mean(c(1, 0, 1, 1)) the site richness values)
# but sometimes the function returns 1.

I think this bug likely has something to do with the reordering of the data prior to computing the accumulation curves to try to randomize the pattern by which ties are encountered.

Effect of N, or difference between ind-based and sample-based implicit curves

This discussion is related to @dmcglinn 's pull request #18 .

To summarize (for our future selves), our algorithm for the pair-wise comparisons consists of three steps:

Compare the individual-based rarefaction curves between two groups, which indicates the effect of the shape of the SAD and regional S on local diversity.
2a. Compare the implicit sample-based rarefaction curves, which indicates the effect of the shape of the SAD, regional S, and group-level density.
2b. Subtract 1 from 2a yields the effect of density.
3a. Compare the spatially explicit sample-based curves, which indicates the effect of everything (SAD, regional S, density, and spatial aggregation).
3b. Subtract 2a from 3a gives the effect of aggregation.

1 and 3 are fairly self explanatory, but there is some decision that we have to make in 2. To subtract the ind-based curve from the sample-based curve, we have to rescale the latter from number of samples to number of individuals. Curves from both groups have to follow the same rescaling factor, otherwise the sample-based curves would no longer match (ie they may have the same number of samples but end up with different numbers of individuals after rescaling). Our (mostly Brian's and my) previous decision is to first ask user to specify if there is a group with "baseline" density, e.g., the control (the "scaleby" parameter). If left unspecified, the min density is used, so that the sample-based curves fall back to the end point as the individual-based curves.

Another option would be the mean, which I think is what's being implemented in #18 . My guess is that this decision can change the results, but I'm not sure if the null model would naturally take care of that. And it's not clear to me if one decision is better than the others. @dmcglinn what do you think?

mob_stats breaks when group_var is not a factor

This is because of this line which assumes that group is a factor. We need a more robust system because group could be a character or a numeric class of variable.

subset argument for 'get_delta_stats'

Many of R's default functions (e.g., plot and lm) have an argument called subset that allows the user to run the code over a subset of the rows, this would be a nice addition to our code because currently you have to define a new mob_in or go through a lot of hassle doing manual subsets.

Place null algos in separate functions per #29

We should break the three null model algos out as separate functions rather than just nestled within the get_delta_stats function. My reasons are 1) it will be easier to write a function for generating the null results for rarefied S only and not delta S which will need the same shuffling algos, 2) it makes it easier for other scientists to jump right to the important part (the null algos) and see how they work in a simple transparent way.

randomization test for samples gone?

Hey @dmcglinn is that function you wrote to obtain the null expectation of delta S vs N based on randomizing samples still in here? Or did it get renamed to something else?

great circle distance

it would be trivial to add this to our function rarefaction

   pair_dist = fields::rdist.earth(xy_coords)

when the user specifies that the xy_coords are longitude and latitude records. This function computes the great circle distance. Does anyone think this is worthwhile enough to warrant the additional dependence on the fields package?

How to incorporate blocks or strata into the mob delta analysis

Currently in the delta analysis all of the plots in a given treatment are as part of the same "population" but many times in experimental designs there are blocks that the investigator would like to be able to control for when carrying out the analysis of the treatment effect. This same problem also occurs in analyses of time series such as in the Portal dataset in which there are many plots in the same treatment that each have been sampled through time. If one is only interested in temporal (rather than spatial) patterns of species accumulation then a method is needed to be able to average across the individual quadrat results for both the observed and null deviations.

My feeling is that the way we want to approach this is to estimate the treatment level curve by averaging the separate block specific curves, but in practice this may be difficult to implement because some blocks may have small sample sizes (individuals or samples) and because all of how our null models are conducted. Let's use this thread to brainstorm how to best implement these analyses

rarefy_sample_explicit function

hey @rueuntal can you explain to me what these lines of code are accomplishing:

https://github.com/MoBiodiv/mobr/blob/master/R/mobr.R#L46-L49

        #1 for absence, 0 for presence
        sp_bool = as.data.frame(ifelse(sp_ordered[ , 1:ncol(sp_ordered)] == 0, 1, 0)) 
        rich = cumprod(sp_bool)
        explic_loop[ , i] = as.numeric(ncol(dat_sp) - 1 - rowSums(rich))

I don't understand what the variable rich represents? Thanks!

Options for test of aggregation

It would probably be good for us to keep record of the different algorithms that we have tried, and whether/how/why it has failed. Since the code has been substantially modified in structure (#75 ), I'll re-test some of our old ideas and summarize the results here. If folks are interesting in testing out some new algorithms, the R code for sensitivity analysis is in scripts/mobr_sensitivity.R (requires @FelixMay 's MoBspatial package, and change of directories to run).

Issues from Shane Blowes <[email protected]>

Hi Dan and Xiao

I’m continuing the use the mobr package as it develops…thanks for the great resource.

Couple of (simple) things have popped up in the new version that I want to bring to your attention:
- typo in stack_effects(). The ‘label_indv’ argument should be ‘label_indiv’
- something funky happens with the plot.mob_out function for my data that results in the the individual based rarefaction curves being mislabelled. Specifically, the column order of my groups in the mob_out$indiv_rare is: ‘sample’, ‘fished’, ‘protected’. But as I call set trt_group=‘protected’, the code as currently written mixes up the two curves. I suspect, but don’t know, that this is going to be a problem for any data sets where the alphabetical order of the grouping variable is not the same as the order of the treatment/reference groups?

Finally, my results sometimes have NAs in the mob_out$N part of the object returned by the get_delta_stats() function. I’m still trying to figure out why…

Cheers,
Shane

Column names in plot_attr argument in make_comm_obj() function

Hi!
I have noticed that the column name of the environmental variable passed to plot_attr argument in make_comm_obj() function gets lost somehow. This happens when the environmental variable is a vector or factor or a data.frame with only one column.
The issue happens at line

out$env = data.frame(plot_attr[ , -spat_cols])

Adding this line after the one above fixed the issue for me

colnames(out$env) <- colnames(plot_attr)[-spat_cols]

Best,
Valentin

Report overall significance level?

John requested that practitioners would like to know an overall significance levels for each stage of the analysis. I'm a bit resistant to this because it de-emphasizes that the answer depends on the scale of the analysis but I can still see the importance. There is a precedent for an overall significance test from the spatial correlogram analyses. Legendre and Fortin (1989) suggest that overall significance can be assessed if there is at least one point that is significant at the level of alpha_prime = alpha / n where alpha is typically 0.05 and n is the number of tests which in our case will be number points along our rarefaction curves we examine. This is the Bonferroni method of correcting for multiple tests.

spatially explicit rarefaction reference?

hey @ngotelli @rueuntal and @FelixMay do any of you know of a good reference to cite for our approach to spatially explicit rarefaction. The general concept is described in Chiarucci et al. (2009) but they use a different way of defining how to select the next sampled (a k-nearest neighbor approach). Our approach is a bit simpler than their approach is and I'm curious if there is a precedent for it in the literature.

mobiodiv / mobr Goto Github PK

mobr's People

Contributors

Stargazers

Watchers

Forkers

mobr's Issues

load objects

source mobr.R and mobr_boxplots.R from local machine

Make community object with mobr

Run function for summary statistics - getting error in SpecAbunAce {SpadeR}

Recommend Projects

Recommend Topics

Recommend Org