data-edu / tidylpa Goto Github PK

View Code? Open in Web Editor NEW

55.0 55.0 15.0 25.4 MB

Easily carry out Latent Profile Analysis (LPA) using open-source or commercial software

Home Page: https://data-edu.github.io/tidyLPA/

License: Other

R 98.23% TeX 1.77%

tidylpa's People

Contributors

Stargazers

Watchers

Forkers

jpritikin cjvanlissa datalorax qiyi1 needystatistician gootjes anuragsinghchaudhary bretsw miaomiaolegemie marcus-waldman giftededu leosfsm franciscowilhelm hlblade cassowary4health

tidylpa's Issues

add warning for plot profiles

Remove "Overall" Syntax when "Class-Specific" Present

Just a suggestion to consider removing the code from the "Overall" section of the mplus model specification when the same code is specified in the "Class-specific" section of the mplus model specification.

Check that AWE is calculated correctly

add a plotting function for compare_models_lpa

address style issues raised by DJA

plot_profiles()

update plot_profiles() when to_return = "mclust" to use tidyeval
check and update plot_profiles() style

compare_solutions()

add error checking for wrong spelling of arguments (i.e., for what statistic to plot for compare_solutions())

compare_solutions_mplus()

add more info when model cannot be fitted about why

estimate_profiles_plus()

make all of the Mplus lines lowercase
add a helper function to create the model syntax for Mplus

add RMixMod integration

add additional fit indices

Add LMR,CAIC, and SABIC via request (here)[https://twitter.com/kdaljeet/status/925024958845140994]

add centering and scaling options to functions that create profiles

Introduction to tidyLPA

These are some issues I found in the Introduction to tidyLPA document.

(1) p. 2 - the example output for iris_log_lik_2_profiles shows results for model number 5 (rather than model 6). Is this correct?
(2) p. 4, second paragraph. "When we run the following line, we see a number of statistics - LogLik for the
log-likelihood, a number of information criteria (AIC, CAIC, BIC, SABIC, and ICL), as well as the entropy." However, we see only one information criterion, the BIC.

Minor
(1) p. 2 - Delete last word of first sentence "to."
(2) p. 4 - First sentence, there is a missing end parenthesis.

add error bars and model name to the plot

plot_profiles() with more than nine variables

plot_profiles() currently uses

scale_fill_brewer("", type = "qual", palette = 6)

to choose the colors for the profiles, which means it works as it should for up to nine different variables. To accommodate models with more variables, could this be changed to use something like

colourCount <- length(x-2)
getPalette <- colorRampPalette(brewer.pal(9, "Set1"))
pal <- getPalette(colourCount)

and then

ggplot(aes_string(x = "profile", y = "mean", fill = pal, ymin = "ymin", ymax = "ymax")) + 
                    geom_col(position = "dodge") + geom_errorbar(position = position_dodge()) + 
                    scale_x_discrete("") + theme_bw()

The syntax I'm suggesting is probably a bit silly, but I hope my request is understandable. I'm talking about the approach taken at https://www.r-bloggers.com/how-to-expand-color-palette-with-ggplot-and-rcolorbrewer/

issue with mplusAvailable

When I load tidyLPA, I receive the following message (with the following warning):

> devtools::load_all(".")
Loading tidyLPA
tidyLPA has received a major update, with a much easier workflow and improved functionality. However, you might have to update old syntax to account for the new workflow. See vignette('introduction-to-major-changes') for details!

Mplus is not installed. Use only package = 'mclust' when calling estimate_profiles().

However, I have Mplus installed - and the functions that use Mplus work.

Check that the extra variables in plot_profiles() do not cause issues

From this question: If the only variables in the data frame are those that will be included in the LPA, then I do not get a warning message and the plots plot the variables in the data frame. So, if I have extra variables in the data frame that I am not including in the LPA, the ggplots return all variables in the data frame rather than only those included in the LPA.

print-method()

This one is not related to the JOSS-review... ;-)

It would be nice to have a textual output for estimate_profile() that resembles the same "results" as the plot_profiles()-method, i.e. printing means and s.e. for each profile. So I would suggest you could add a print()-method and add a class-atribute to the returned data frame fromestimate_profile().

add option to calculate bootstrapped standard errors for mclust output

Introduced in #31, think this would be a good (and / though computationally-intensive) feature to add.

Editing plots

I wonder if there's a way to edit the legends or the aesthetics of profile plots (size, color, align, etc.) within the plot_profiles() function -without editing it.

check warning from plot_profiles_lpa()

Warning message:
attributes are not identical across measure variables;
they will be dropped

Consistency of auto-generated mplus models with description in intro to tidyLPA

In the Intro to tidyLPA the models 1 through 6 are described as follows:

... equal variances, and covariances fixed to 0 (model 1)
... equal variances, and equal covariances (model 2)
... varying variances, and covariances fixed to 0 (model 3)
... varying variances, and equal covariances (model 4)
... equal variances, and varying covariances (model 5)
... varying variances, and varying covariances (model 6)

IT looks to me as if the naming used in estimate_profiles_mplus is inconsistent with this description. In particular, in estimate_profiles_mplus

model = 1 gives ... equal variances, and covariances [OK]
model = 2 gives ... varying variances, and covariances fixed to 0 [should be model 3]
model = 3 gives ... equal variances, and equal covariances [should be model 2]
model = 4 gives ... varying variances, and varying covariances [should be model 6]

models 4 and 5 are not defined in estimate_profiles_mplus.

[JOSS-review] clarify docs

This one is related to openjournals/joss-reviews#978

I just installed the package and tried out some functions. For estimate_profiles(), the help-file says, as return value

either a tibble or a ggplot2 plot of the BIC values for the explored models

But I don't know how to create a ggplot-object. Could you clarify this in your docs, or help me finding the relevant passages in the help?

[Feature request]: Profile line plots

It would be great to have the option to make line plots instead of bar plots of the estimated profile parameters. Since it's dependent data within the profiles, it's more intuitive for many researchers applying LPA.

benchmark code against Patrick B's code from the mixture workshop

fix the model argument to compare_solutions_mplus()

this doesn't seem to work correctly

inp file created by estimate_profiles_mplus() has spaces between lines

add information to data.frame/tibble with class probabilities returned from estimate_profiles_mplus

when one requests class probabilities from estimate_profiles_mplus by setting return_save_data = R, information about the model fit like BIC is currently not returned.

It would be useful if this information could be added to the tibble/data.frame with the class probabilities as an attribute, for instance by adding these lines of code the estimate_profiles_mplus.R, just before the tibble is returned (around line 335):

}
        fit_stats = c("LL","BIC","aBIC","AIC","Entropy")
        fs = rep(NA,length(fit_stats))
        names(fs) = fit_stats
        available_fit_stats = intersect(names(m$summaries),fit_stats)
        for (s in available_fit_stats)
            fs[s] = m$summaries[,s]
        attr(x,"fit_stats") = fs
        attr(x,"mplus_warnings") = m$warnings
        attr(x,"mplus_errors") = m$errors

Error for plot_profiles_mplus()

When I use estimate_profiles_mplus(d,V1, V2, V3,V4, V5,n_profiles = 2) %>% plot_profiles_mplus() or plot_profiles_mplus(), I am getting the following error
Error in mutate_impl(.data, dots) : Evaluation error: Column C not found in .data.

return full dataset when to_return = "tibble"

presently, only the complete cases and selected variables are returned

new version not working - what am I doing wrong?

First, thank you for an awesome package! I am trying to use the new version you just released (0.2.4) and am following your example code pretty much exactly, but it doesn't appear to work. Quite likely I am doing something wrong, but I can't figure out what. I am attaching the data I am using. Here is the code and output I am getting:

lpaData %>%

select("inert1", "coord1", "coord0", "inert0") %>%
estimate_profiles(n_profiles=1:3, models=1:2) ->
results
Fit Equal variances and covariances fixed to 0 (model 1) model with 123 profiles.
LogLik is 52.91
BIC is 65.0376.81968.761
Entropy is 0.996
Warning message:
In if (test < n_profiles) warning("Some profiles are associated with no assignments. Interpret this solution with caution and consider other models.") :
the condition has length > 1 and only the first element will be used

compare_solutions(results)
Error in if (G[1] == 1) { : missing value where TRUE/FALSE needed

Many thanks for any help with this! Cheers, Emily
lpaData.xlsx

update plot_profiles_mplus() to plot estimated standard errors

Presently, the standard errors that are plotted are calculated (manually) based on the posterior class / profile - as noted in the vignette, plot_profiles() now has the option to plot the bootstrapped standard errors from mclust. We should include this functionality for mplus output, too.

Error: package or namespace load failed for ‘tidyLPA’

Error: package or namespace load failed for ‘tidyLPA’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
namespace ‘rlang’ 0.1.6 is already loaded, but >= 0.2.0 is required
In addition: Warning message:
package ‘tidyLPA’ was built under R version 3.4.4

[JOSS-Review] fix license.md

This issue is related to openjournals/joss-reviews#978.

Just a minor thing, please change "your name goes here" in https://github.com/jrosen48/tidyLPA/blob/master/LICENSE.md.

warning for plot_profiles_mplus: attributes are not identical across measure variables

Need to address this warning - not sure what it's caused by:

test-plot_profiles_mplus.R:7: warning: plot_profiles_mplus_works
attributes are not identical across measure variables;
they will be dropped

unable to download tidyLPA package from GitHub

Hello,

I would like to use tidyLPA but get the following error message when I try to download it on my PC:

Warning in gzfile(file, "rb") :
cannot open compressed file 'C:/Users/cvaughan/Documents/desktop.m.rds', probable reason 'No such file or directory'
Error in gzfile(file, "rb") : cannot open the connection
Error : unable to load R code in package 'tidyLPA'
ERROR: lazy loading failed for package 'tidyLPA'

I tried downloading the package on my personal Mac and got the same error message, so I don't think there is a firewall issue with my PC as related posts on stackexchange about the "cannot open the connection" error message have suggested. If you have a remedy for this problem, I would be very interested in it. Thank you in advance!

parameters for variances and covariances don't seem to work

I could not set the parameters for variance and covariance in both compare_solutions and estimate_solutions .

compare_solutions(plan, init, exe, correct, models = list(c("equal", "zero"), c("varying", "zero"), c("equal","equal"), c("varying", "varying")))
Error: list(c("equal", "zero"), c("varying", "zero"), c("equal", "equal"), c("varying", "varying")) must evaluate to column positions or names, not a list
In addition: Warning message:
'glue::collapse' is deprecated.
Use 'glue_collapse' instead.
See help("Deprecated") and help("glue-deprecated").

estimate_profiles(plan, init, exe, correct, n_profiles=3, variances="zero")
Error: Unknown column zero
In addition: Warning message:
'glue::collapse' is deprecated.
Use 'glue_collapse' instead.
See help("Deprecated") and help("glue-deprecated").

Also, i'm wondering if there is anyway to extract the entropy values?

typo in estimate_profiles_mplus.R

line 132 in estimate_profiles_mplus.R is
ANALYSIS_line2 <- paste0("start = ", starts[1], " ", starts[2], ";")
but it should be
ANALYSIS_line2 <- paste0("starts = ", starts[1], " ", starts[2], ";")
("starts =" not "start =")

also, nice package!

paper

Here are my required and suggested edits for the tidyLPA paper.

Required edits
(1) Add a DOI to the Harring et al. (2016) and Pastor et al. (2007) references.

Suggested edits
(1) It would be preferable to cite the original source for latent class analysis, which I believe is McCutcheon (1987), but please verify. McCutcheon, A. C. (1987). Latent class analysis. Beverly Hills, CA: Sage.

(2) Briefly describe Models 4 and 5.

Continuous variables

Hi,

This isn't an issue as such, but it might be worth making it explicit that the package works best for continuous variables. I have 6 variables that are ordinal I suppose, at best - effectively a range of 1 - 12, in increments of 1, but with little variation (ie. a lot of people score 1, 6 or 12).

Originally I was using LPA because I thought it'd accommodate the data, but it's not best suited to it. The package gives info like 'model couldn't be fitted', and it took me ages to work out why. Totally my fault.

add new models

change the cluster statistic in compare_solutions_mplus() to return the number of clusters/groups in the data, instead of the name of the variable

check AIC from compare_solutions_mplus() values against mplus

they are negative - I think they may need to be positive (and possibly * 2)

[JOSS-review] statement of need in readme.md

This one is related to openjournals/joss-reviews#978

While there's a clear statement of need in the paper, the background section of the readme.md could elaborate a bit more on why tidyLPA is needed, also in the context of what already exists. It's just a minor issue, I would say adding a small sentence to the background should be OK.

check that alternate model names for both estimate_profiles_mplus() and compare_solutions_mplus() can be used

make return_save_data = T by default

refactor code for functions that use mplus

There is a need to refactor code for functions that use mplus (i.e., https://github.com/jrosen48/tidyLPA/blob/master/R/estimate_profiles_mplus.R and https://github.com/jrosen48/tidyLPA/blob/master/R/compare_solutions_mplus.R).

These were written when it wasn't clear whether they would even work, and they are pretty delicate (i.e., the way errors and warnings are detected is very ad hoc) and will be difficult to update in the future because the for confusing for loops that generate MPlus syntax. Part of the solution is just to re-write them but I think part may be be better understanding how to work with the MPlus output, using or not using MplusAutomation.

bootstrap_lrt error

Hello,

I get the following error when running the bootstrap_lrt function:
Error in txtProgressBar(min = 0, max = (max(G) - 1) * nboot, style = 3) :
must have 'max' > 'min'

I'm running the following code:
bootstrap_lrt (df, var1, var2, n_profiles=5, model="VVV")

I would appreciate any comments on this error. Thank you!

Version 1.0.0.0 throws error for one-dimensional data.frames

Error in [<-(*tmp*, "1", mdl, value = bic(modelName = mdl, loglik = out$loglik, :
subscript out of bounds

create a temporary directory for the Mplus files

Including a variable in the dataset/VARIABLE command but not in the syntax for the model

I'm trying to include the option to account for the nesting/clustering of observations using MPlus. I'm trying to do this by specifying under the VARIABLE command a clustering variable with Cluster = [name_of_var] in conjunction with the ANALYSIS command Type is complex;.

I'm having some trouble because when I include the variable in the dataset (that is prepared in MPlus format), we use similar code to grab the names of the variables to be used for the analysis, and so in my first attempt or two I've ended up with profiles that include the clustering variable. That's not what I want.

I noticed that we also include an ID variable (h/t @datalorax). However, this variable seems to avoid the fate of the clustering variable (i.e., it's not included in the clustering). I've reviewed the code but am having trouble solving this problem so I thought I would reach out. I think it's something to do in this (excerpted) code from the estimate_profiles_mplus.R file:

d <- select_ancillary_functions_mplus(df, ..., cluster_ID)
 if (is.null(idvar)) {
   id <- data_frame(id = as.numeric(rownames(df)))
   idvar <- "rownum"
 } else {
   if (length(unique(df[[idvar]])) != length(df[[idvar]])) {
     stop("ID variable must be unique")
   }
   if (is.character(df[[idvar]])) {
     string_id <- df[[idvar]]
     num_id <- seq_along(df[[idvar]])
     id <- data_frame(id = num_id)
     names(id) <- idvar
   }
   else {
     id <- data_frame(id = df[[idvar]])
     names(id) <- idvar
   }
 }
 d <- bind_cols(id, d)

 names(d) <- gsub("\\.", "_", names(d))

 x <- write_mplus(d, data_filename)

 var_list <- vector("list", ncol(d))
 for (i in seq_along(names(d))) {
   var_list[[i]] <- names(d)[i]
 }

Any ideas/tips would be much appreciated.

Look into how MCLUST calculates BIC

The 'y' axis of graph produced in the plot_profiles command states 'BIC (smaller is better)', I believe that this is false. MCLUST calculates BIC as 2×ln(L(theta|x))+k×ln(n) not as −2×ln(L(theta|x))+k×ln(n). In the first case large BICs are better and in the second case small BICs are smaller. See the following link for more details:
https://stats.stackexchange.com/questions/237220/mclust-model-selection

[JOSS-Review] Add missing dois to paper

This one is related to openjournals/joss-reviews#978

There appear no DOIs in the paper's references. I assume that at least the cited journal articles have a DOI, so please check and add DOIs where available.

BIC discrepancies

tidyLPA is very useful - many thanks. In my results (I am using the version available on CRAN) there seems to be a discrepancy between the BIC scores obtained from compare_solutions when compared with the BIC obtained with estimate_profiles. Specifically, when looking at the graph produced with compare_solutions, the BIC for "varying variances and covariances fixed to zero" corresponds to the BIC for "equal variances and equal covariances" which is obtained with estimate_profiles, and vice versa. Is there a bug or am I doing something wrong? Many thanks

data-edu / tidylpa Goto Github PK

tidylpa's People

Contributors

Stargazers

Watchers

Forkers

tidylpa's Issues

plot_profiles()

compare_solutions()

compare_solutions_mplus()

estimate_profiles_plus()

Recommend Projects

Recommend Topics

Recommend Org