data-edu / tidylpa Goto Github PK
View Code? Open in Web Editor NEWEasily carry out Latent Profile Analysis (LPA) using open-source or commercial software
Home Page: https://data-edu.github.io/tidyLPA/
License: Other
Easily carry out Latent Profile Analysis (LPA) using open-source or commercial software
Home Page: https://data-edu.github.io/tidyLPA/
License: Other
Just a suggestion to consider removing the code from the "Overall" section of the mplus model specification when the same code is specified in the "Class-specific" section of the mplus model specification.
plot_profiles()
when to_return
= "mclust" to use tidyevalplot_profiles()
stylecompare_solutions()
)Add LMR,CAIC, and SABIC via request (here)[https://twitter.com/kdaljeet/status/925024958845140994]
These are some issues I found in the Introduction to tidyLPA document.
(1) p. 2 - the example output for iris_log_lik_2_profiles shows results for model number 5 (rather than model 6). Is this correct?
(2) p. 4, second paragraph. "When we run the following line, we see a number of statistics - LogLik for the
log-likelihood, a number of information criteria (AIC, CAIC, BIC, SABIC, and ICL), as well as the entropy." However, we see only one information criterion, the BIC.
Minor
(1) p. 2 - Delete last word of first sentence "to."
(2) p. 4 - First sentence, there is a missing end parenthesis.
plot_profiles() currently uses
scale_fill_brewer("", type = "qual", palette = 6)
to choose the colors for the profiles, which means it works as it should for up to nine different variables. To accommodate models with more variables, could this be changed to use something like
colourCount <- length(x-2)
getPalette <- colorRampPalette(brewer.pal(9, "Set1"))
pal <- getPalette(colourCount)
and then
ggplot(aes_string(x = "profile", y = "mean", fill = pal, ymin = "ymin", ymax = "ymax")) +
geom_col(position = "dodge") + geom_errorbar(position = position_dodge()) +
scale_x_discrete("") + theme_bw()
The syntax I'm suggesting is probably a bit silly, but I hope my request is understandable. I'm talking about the approach taken at https://www.r-bloggers.com/how-to-expand-color-palette-with-ggplot-and-rcolorbrewer/
When I load tidyLPA, I receive the following message (with the following warning):
> devtools::load_all(".")
Loading tidyLPA
tidyLPA has received a major update, with a much easier workflow and improved functionality. However, you might have to update old syntax to account for the new workflow. See vignette('introduction-to-major-changes') for details!
Mplus is not installed. Use only package = 'mclust' when calling estimate_profiles().
However, I have Mplus installed - and the functions that use Mplus work.
From this question: If the only variables in the data frame are those that will be included in the LPA, then I do not get a warning message and the plots plot the variables in the data frame. So, if I have extra variables in the data frame that I am not including in the LPA, the ggplots return all variables in the data frame rather than only those included in the LPA.
This one is not related to the JOSS-review... ;-)
It would be nice to have a textual output for estimate_profile()
that resembles the same "results" as the plot_profiles()
-method, i.e. printing means and s.e. for each profile. So I would suggest you could add a print()-method and add a class-atribute to the returned data frame fromestimate_profile()
.
Introduced in #31, think this would be a good (and / though computationally-intensive) feature to add.
I wonder if there's a way to edit the legends or the aesthetics of profile plots (size, color, align, etc.) within the plot_profiles()
function -without editing it.
Warning message:
attributes are not identical across measure variables;
they will be dropped
In the Intro to tidyLPA the models 1 through 6 are described as follows:
IT looks to me as if the naming used in estimate_profiles_mplus
is inconsistent with this description. In particular, in estimate_profiles_mplus
model = 1
gives ... equal variances, and covariances [OK]
model = 2
gives ... varying variances, and covariances fixed to 0 [should be model 3]
model = 3
gives ... equal variances, and equal covariances [should be model 2]
model = 4
gives ... varying variances, and varying covariances [should be model 6]
models 4 and 5 are not defined in estimate_profiles_mplus
.
This one is related to openjournals/joss-reviews#978
I just installed the package and tried out some functions. For estimate_profiles()
, the help-file says, as return value
either a tibble or a ggplot2 plot of the BIC values for the explored models
But I don't know how to create a ggplot-object. Could you clarify this in your docs, or help me finding the relevant passages in the help?
It would be great to have the option to make line plots instead of bar plots of the estimated profile parameters. Since it's dependent data within the profiles, it's more intuitive for many researchers applying LPA.
this doesn't seem to work correctly
when one requests class probabilities from estimate_profiles_mplus
by setting return_save_data = R
, information about the model fit like BIC is currently not returned.
It would be useful if this information could be added to the tibble/data.frame with the class probabilities as an attribute, for instance by adding these lines of code the estimate_profiles_mplus.R, just before the tibble is returned (around line 335):
}
fit_stats = c("LL","BIC","aBIC","AIC","Entropy")
fs = rep(NA,length(fit_stats))
names(fs) = fit_stats
available_fit_stats = intersect(names(m$summaries),fit_stats)
for (s in available_fit_stats)
fs[s] = m$summaries[,s]
attr(x,"fit_stats") = fs
attr(x,"mplus_warnings") = m$warnings
attr(x,"mplus_errors") = m$errors
When I use estimate_profiles_mplus(d,V1, V2, V3,V4, V5,n_profiles = 2) %>% plot_profiles_mplus()
or plot_profiles_mplus()
, I am getting the following error
Error in mutate_impl(.data, dots) : Evaluation error: Column C not found in .data.
presently, only the complete cases and selected variables are returned
First, thank you for an awesome package! I am trying to use the new version you just released (0.2.4) and am following your example code pretty much exactly, but it doesn't appear to work. Quite likely I am doing something wrong, but I can't figure out what. I am attaching the data I am using. Here is the code and output I am getting:
lpaData %>%
compare_solutions(results)
Error in if (G[1] == 1) { : missing value where TRUE/FALSE needed
Many thanks for any help with this! Cheers, Emily
lpaData.xlsx
Presently, the standard errors that are plotted are calculated (manually) based on the posterior class / profile - as noted in the vignette, plot_profiles()
now has the option to plot the bootstrapped standard errors from mclust. We should include this functionality for mplus output, too.
Error: package or namespace load failed for ‘tidyLPA’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
namespace ‘rlang’ 0.1.6 is already loaded, but >= 0.2.0 is required
In addition: Warning message:
package ‘tidyLPA’ was built under R version 3.4.4
This issue is related to openjournals/joss-reviews#978.
Just a minor thing, please change "your name goes here" in https://github.com/jrosen48/tidyLPA/blob/master/LICENSE.md.
Need to address this warning - not sure what it's caused by:
test-plot_profiles_mplus.R:7: warning: plot_profiles_mplus_works
attributes are not identical across measure variables;
they will be dropped
Hello,
I would like to use tidyLPA but get the following error message when I try to download it on my PC:
Warning in gzfile(file, "rb") :
cannot open compressed file 'C:/Users/cvaughan/Documents/desktop.m.rds', probable reason 'No such file or directory'
Error in gzfile(file, "rb") : cannot open the connection
Error : unable to load R code in package 'tidyLPA'
ERROR: lazy loading failed for package 'tidyLPA'
I tried downloading the package on my personal Mac and got the same error message, so I don't think there is a firewall issue with my PC as related posts on stackexchange about the "cannot open the connection" error message have suggested. If you have a remedy for this problem, I would be very interested in it. Thank you in advance!
I could not set the parameters for variance and covariance in both compare_solutions and estimate_solutions .
compare_solutions(plan, init, exe, correct, models = list(c("equal", "zero"), c("varying", "zero"), c("equal","equal"), c("varying", "varying")))
Error:list(c("equal", "zero"), c("varying", "zero"), c("equal", "equal"), c("varying", "varying"))
must evaluate to column positions or names, not a list
In addition: Warning message:
'glue::collapse' is deprecated.
Use 'glue_collapse' instead.
See help("Deprecated") and help("glue-deprecated").
estimate_profiles(plan, init, exe, correct, n_profiles=3, variances="zero")
Error: Unknown columnzero
In addition: Warning message:
'glue::collapse' is deprecated.
Use 'glue_collapse' instead.
See help("Deprecated") and help("glue-deprecated").
Also, i'm wondering if there is anyway to extract the entropy values?
line 132 in estimate_profiles_mplus.R is
ANALYSIS_line2 <- paste0("start = ", starts[1], " ", starts[2], ";")
but it should be
ANALYSIS_line2 <- paste0("starts = ", starts[1], " ", starts[2], ";")
("starts =" not "start =")
also, nice package!
Here are my required and suggested edits for the tidyLPA paper.
Required edits
(1) Add a DOI to the Harring et al. (2016) and Pastor et al. (2007) references.
Suggested edits
(1) It would be preferable to cite the original source for latent class analysis, which I believe is McCutcheon (1987), but please verify. McCutcheon, A. C. (1987). Latent class analysis. Beverly Hills, CA: Sage.
(2) Briefly describe Models 4 and 5.
Hi,
This isn't an issue as such, but it might be worth making it explicit that the package works best for continuous variables. I have 6 variables that are ordinal I suppose, at best - effectively a range of 1 - 12, in increments of 1, but with little variation (ie. a lot of people score 1, 6 or 12).
Originally I was using LPA because I thought it'd accommodate the data, but it's not best suited to it. The package gives info like 'model couldn't be fitted', and it took me ages to work out why. Totally my fault.
they are negative - I think they may need to be positive (and possibly * 2)
This one is related to openjournals/joss-reviews#978
While there's a clear statement of need in the paper, the background section of the readme.md could elaborate a bit more on why tidyLPA is needed, also in the context of what already exists. It's just a minor issue, I would say adding a small sentence to the background should be OK.
There is a need to refactor code for functions that use mplus (i.e., https://github.com/jrosen48/tidyLPA/blob/master/R/estimate_profiles_mplus.R and https://github.com/jrosen48/tidyLPA/blob/master/R/compare_solutions_mplus.R).
These were written when it wasn't clear whether they would even work, and they are pretty delicate (i.e., the way errors and warnings are detected is very ad hoc) and will be difficult to update in the future because the for confusing for loops that generate MPlus syntax. Part of the solution is just to re-write them but I think part may be be better understanding how to work with the MPlus output, using or not using MplusAutomation.
Hello,
I get the following error when running the bootstrap_lrt function:
Error in txtProgressBar(min = 0, max = (max(G) - 1) * nboot, style = 3) :
must have 'max' > 'min'
I'm running the following code:
bootstrap_lrt (df, var1, var2, n_profiles=5, model="VVV")
I would appreciate any comments on this error. Thank you!
Error in [<-
(*tmp*
, "1", mdl, value = bic(modelName = mdl, loglik = out$loglik, :
subscript out of bounds
I'm trying to include the option to account for the nesting/clustering of observations using MPlus. I'm trying to do this by specifying under the VARIABLE
command a clustering variable with Cluster = [name_of_var]
in conjunction with the ANALYSIS
command Type is complex;
.
I'm having some trouble because when I include the variable in the dataset (that is prepared in MPlus format), we use similar code to grab the names of the variables to be used for the analysis, and so in my first attempt or two I've ended up with profiles that include the clustering variable. That's not what I want.
I noticed that we also include an ID variable (h/t @datalorax). However, this variable seems to avoid the fate of the clustering variable (i.e., it's not included in the clustering). I've reviewed the code but am having trouble solving this problem so I thought I would reach out. I think it's something to do in this (excerpted) code from the estimate_profiles_mplus.R
file:
d <- select_ancillary_functions_mplus(df, ..., cluster_ID)
if (is.null(idvar)) {
id <- data_frame(id = as.numeric(rownames(df)))
idvar <- "rownum"
} else {
if (length(unique(df[[idvar]])) != length(df[[idvar]])) {
stop("ID variable must be unique")
}
if (is.character(df[[idvar]])) {
string_id <- df[[idvar]]
num_id <- seq_along(df[[idvar]])
id <- data_frame(id = num_id)
names(id) <- idvar
}
else {
id <- data_frame(id = df[[idvar]])
names(id) <- idvar
}
}
d <- bind_cols(id, d)
names(d) <- gsub("\\.", "_", names(d))
x <- write_mplus(d, data_filename)
var_list <- vector("list", ncol(d))
for (i in seq_along(names(d))) {
var_list[[i]] <- names(d)[i]
}
Any ideas/tips would be much appreciated.
The 'y' axis of graph produced in the plot_profiles command states 'BIC (smaller is better)', I believe that this is false. MCLUST calculates BIC as 2×ln(L(theta|x))+k×ln(n) not as −2×ln(L(theta|x))+k×ln(n). In the first case large BICs are better and in the second case small BICs are smaller. See the following link for more details:
https://stats.stackexchange.com/questions/237220/mclust-model-selection
This one is related to openjournals/joss-reviews#978
There appear no DOIs in the paper's references. I assume that at least the cited journal articles have a DOI, so please check and add DOIs where available.
tidyLPA is very useful - many thanks. In my results (I am using the version available on CRAN) there seems to be a discrepancy between the BIC scores obtained from compare_solutions when compared with the BIC obtained with estimate_profiles. Specifically, when looking at the graph produced with compare_solutions, the BIC for "varying variances and covariances fixed to zero" corresponds to the BIC for "equal variances and equal covariances" which is obtained with estimate_profiles, and vice versa. Is there a bug or am I doing something wrong? Many thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.