Giter Site home page Giter Site logo

tidylpa's People

Contributors

gbiele avatar gootjes avatar joshuarosenberg avatar jrosen48 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

tidylpa's Issues

address style issues raised by DJA

plot_profiles()

  • update plot_profiles() when to_return = "mclust" to use tidyeval
  • check and update plot_profiles() style

compare_solutions()

  • add error checking for wrong spelling of arguments (i.e., for what statistic to plot for compare_solutions())

compare_solutions_mplus()

  • add more info when model cannot be fitted about why

estimate_profiles_plus()

  • make all of the Mplus lines lowercase
  • add a helper function to create the model syntax for Mplus

Introduction to tidyLPA

These are some issues I found in the Introduction to tidyLPA document.

(1) p. 2 - the example output for iris_log_lik_2_profiles shows results for model number 5 (rather than model 6). Is this correct?
(2) p. 4, second paragraph. "When we run the following line, we see a number of statistics - LogLik for the
log-likelihood, a number of information criteria (AIC, CAIC, BIC, SABIC, and ICL), as well as the entropy." However, we see only one information criterion, the BIC.

Minor
(1) p. 2 - Delete last word of first sentence "to."
(2) p. 4 - First sentence, there is a missing end parenthesis.

plot_profiles() with more than nine variables

plot_profiles() currently uses

scale_fill_brewer("", type = "qual", palette = 6)

to choose the colors for the profiles, which means it works as it should for up to nine different variables. To accommodate models with more variables, could this be changed to use something like

colourCount <- length(x-2)
getPalette <- colorRampPalette(brewer.pal(9, "Set1"))
pal <- getPalette(colourCount)

and then

ggplot(aes_string(x = "profile", y = "mean", fill = pal, ymin = "ymin", ymax = "ymax")) + 
                    geom_col(position = "dodge") + geom_errorbar(position = position_dodge()) + 
                    scale_x_discrete("") + theme_bw()

The syntax I'm suggesting is probably a bit silly, but I hope my request is understandable. I'm talking about the approach taken at https://www.r-bloggers.com/how-to-expand-color-palette-with-ggplot-and-rcolorbrewer/

issue with mplusAvailable

When I load tidyLPA, I receive the following message (with the following warning):

> devtools::load_all(".")
Loading tidyLPA
tidyLPA has received a major update, with a much easier workflow and improved functionality. However, you might have to update old syntax to account for the new workflow. See vignette('introduction-to-major-changes') for details!

Mplus is not installed. Use only package = 'mclust' when calling estimate_profiles().

However, I have Mplus installed - and the functions that use Mplus work.

Check that the extra variables in plot_profiles() do not cause issues

From this question: If the only variables in the data frame are those that will be included in the LPA, then I do not get a warning message and the plots plot the variables in the data frame. So, if I have extra variables in the data frame that I am not including in the LPA, the ggplots return all variables in the data frame rather than only those included in the LPA.

print-method()

This one is not related to the JOSS-review... ;-)

It would be nice to have a textual output for estimate_profile() that resembles the same "results" as the plot_profiles()-method, i.e. printing means and s.e. for each profile. So I would suggest you could add a print()-method and add a class-atribute to the returned data frame fromestimate_profile().

Editing plots

I wonder if there's a way to edit the legends or the aesthetics of profile plots (size, color, align, etc.) within the plot_profiles() function -without editing it.

Consistency of auto-generated mplus models with description in intro to tidyLPA

In the Intro to tidyLPA the models 1 through 6 are described as follows:

  1. ... equal variances, and covariances fixed to 0 (model 1)
  2. ... equal variances, and equal covariances (model 2)
  3. ... varying variances, and covariances fixed to 0 (model 3)
  4. ... varying variances, and equal covariances (model 4)
  5. ... equal variances, and varying covariances (model 5)
  6. ... varying variances, and varying covariances (model 6)

IT looks to me as if the naming used in estimate_profiles_mplus is inconsistent with this description. In particular, in estimate_profiles_mplus

model = 1 gives ... equal variances, and covariances [OK]
model = 2 gives ... varying variances, and covariances fixed to 0 [should be model 3]
model = 3 gives ... equal variances, and equal covariances [should be model 2]
model = 4 gives ... varying variances, and varying covariances [should be model 6]

models 4 and 5 are not defined in estimate_profiles_mplus.

[JOSS-review] clarify docs

This one is related to openjournals/joss-reviews#978

I just installed the package and tried out some functions. For estimate_profiles(), the help-file says, as return value

either a tibble or a ggplot2 plot of the BIC values for the explored models

But I don't know how to create a ggplot-object. Could you clarify this in your docs, or help me finding the relevant passages in the help?

[Feature request]: Profile line plots

It would be great to have the option to make line plots instead of bar plots of the estimated profile parameters. Since it's dependent data within the profiles, it's more intuitive for many researchers applying LPA.

add information to data.frame/tibble with class probabilities returned from estimate_profiles_mplus

when one requests class probabilities from estimate_profiles_mplus by setting return_save_data = R, information about the model fit like BIC is currently not returned.

It would be useful if this information could be added to the tibble/data.frame with the class probabilities as an attribute, for instance by adding these lines of code the estimate_profiles_mplus.R, just before the tibble is returned (around line 335):

}
        fit_stats = c("LL","BIC","aBIC","AIC","Entropy")
        fs = rep(NA,length(fit_stats))
        names(fs) = fit_stats
        available_fit_stats = intersect(names(m$summaries),fit_stats)
        for (s in available_fit_stats)
            fs[s] = m$summaries[,s]
        attr(x,"fit_stats") = fs
        attr(x,"mplus_warnings") = m$warnings
        attr(x,"mplus_errors") = m$errors

Error for plot_profiles_mplus()

When I use estimate_profiles_mplus(d,V1, V2, V3,V4, V5,n_profiles = 2) %>% plot_profiles_mplus() or plot_profiles_mplus(), I am getting the following error
Error in mutate_impl(.data, dots) : Evaluation error: Column C not found in .data.

new version not working - what am I doing wrong?

First, thank you for an awesome package! I am trying to use the new version you just released (0.2.4) and am following your example code pretty much exactly, but it doesn't appear to work. Quite likely I am doing something wrong, but I can't figure out what. I am attaching the data I am using. Here is the code and output I am getting:

lpaData %>%

  • select("inert1", "coord1", "coord0", "inert0") %>%
  • estimate_profiles(n_profiles=1:3, models=1:2) ->
  • results
    Fit Equal variances and covariances fixed to 0 (model 1) model with 123 profiles.
    LogLik is 52.91
    BIC is 65.0376.81968.761
    Entropy is 0.996
    Warning message:
    In if (test < n_profiles) warning("Some profiles are associated with no assignments. Interpret this solution with caution and consider other models.") :
    the condition has length > 1 and only the first element will be used

compare_solutions(results)
Error in if (G[1] == 1) { : missing value where TRUE/FALSE needed

Many thanks for any help with this! Cheers, Emily
lpaData.xlsx

update plot_profiles_mplus() to plot estimated standard errors

Presently, the standard errors that are plotted are calculated (manually) based on the posterior class / profile - as noted in the vignette, plot_profiles() now has the option to plot the bootstrapped standard errors from mclust. We should include this functionality for mplus output, too.

Error: package or namespace load failed for ‘tidyLPA’

Error: package or namespace load failed for ‘tidyLPA’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
namespace ‘rlang’ 0.1.6 is already loaded, but >= 0.2.0 is required
In addition: Warning message:
package ‘tidyLPA’ was built under R version 3.4.4

unable to download tidyLPA package from GitHub

Hello,

I would like to use tidyLPA but get the following error message when I try to download it on my PC:

Warning in gzfile(file, "rb") :
cannot open compressed file 'C:/Users/cvaughan/Documents/desktop.m.rds', probable reason 'No such file or directory'
Error in gzfile(file, "rb") : cannot open the connection
Error : unable to load R code in package 'tidyLPA'
ERROR: lazy loading failed for package 'tidyLPA'

I tried downloading the package on my personal Mac and got the same error message, so I don't think there is a firewall issue with my PC as related posts on stackexchange about the "cannot open the connection" error message have suggested. If you have a remedy for this problem, I would be very interested in it. Thank you in advance!

parameters for variances and covariances don't seem to work

I could not set the parameters for variance and covariance in both compare_solutions and estimate_solutions .

compare_solutions(plan, init, exe, correct, models = list(c("equal", "zero"), c("varying", "zero"), c("equal","equal"), c("varying", "varying")))
Error: list(c("equal", "zero"), c("varying", "zero"), c("equal", "equal"), c("varying", "varying")) must evaluate to column positions or names, not a list
In addition: Warning message:
'glue::collapse' is deprecated.
Use 'glue_collapse' instead.
See help("Deprecated") and help("glue-deprecated").

estimate_profiles(plan, init, exe, correct, n_profiles=3, variances="zero")
Error: Unknown column zero
In addition: Warning message:
'glue::collapse' is deprecated.
Use 'glue_collapse' instead.
See help("Deprecated") and help("glue-deprecated").

Also, i'm wondering if there is anyway to extract the entropy values?

typo in estimate_profiles_mplus.R

line 132 in estimate_profiles_mplus.R is
ANALYSIS_line2 <- paste0("start = ", starts[1], " ", starts[2], ";")
but it should be
ANALYSIS_line2 <- paste0("starts = ", starts[1], " ", starts[2], ";")
("starts =" not "start =")

also, nice package!

paper

Here are my required and suggested edits for the tidyLPA paper.

Required edits
(1) Add a DOI to the Harring et al. (2016) and Pastor et al. (2007) references.

Suggested edits
(1) It would be preferable to cite the original source for latent class analysis, which I believe is McCutcheon (1987), but please verify. McCutcheon, A. C. (1987). Latent class analysis. Beverly Hills, CA: Sage.

(2) Briefly describe Models 4 and 5.

Continuous variables

Hi,

This isn't an issue as such, but it might be worth making it explicit that the package works best for continuous variables. I have 6 variables that are ordinal I suppose, at best - effectively a range of 1 - 12, in increments of 1, but with little variation (ie. a lot of people score 1, 6 or 12).

Originally I was using LPA because I thought it'd accommodate the data, but it's not best suited to it. The package gives info like 'model couldn't be fitted', and it took me ages to work out why. Totally my fault.

[JOSS-review] statement of need in readme.md

This one is related to openjournals/joss-reviews#978

While there's a clear statement of need in the paper, the background section of the readme.md could elaborate a bit more on why tidyLPA is needed, also in the context of what already exists. It's just a minor issue, I would say adding a small sentence to the background should be OK.

refactor code for functions that use mplus

There is a need to refactor code for functions that use mplus (i.e., https://github.com/jrosen48/tidyLPA/blob/master/R/estimate_profiles_mplus.R and https://github.com/jrosen48/tidyLPA/blob/master/R/compare_solutions_mplus.R).

These were written when it wasn't clear whether they would even work, and they are pretty delicate (i.e., the way errors and warnings are detected is very ad hoc) and will be difficult to update in the future because the for confusing for loops that generate MPlus syntax. Part of the solution is just to re-write them but I think part may be be better understanding how to work with the MPlus output, using or not using MplusAutomation.

bootstrap_lrt error

Hello,

I get the following error when running the bootstrap_lrt function:
Error in txtProgressBar(min = 0, max = (max(G) - 1) * nboot, style = 3) :
must have 'max' > 'min'

I'm running the following code:
bootstrap_lrt (df, var1, var2, n_profiles=5, model="VVV")

I would appreciate any comments on this error. Thank you!

Including a variable in the dataset/VARIABLE command but not in the syntax for the model

I'm trying to include the option to account for the nesting/clustering of observations using MPlus. I'm trying to do this by specifying under the VARIABLE command a clustering variable with Cluster = [name_of_var] in conjunction with the ANALYSIS command Type is complex;.

I'm having some trouble because when I include the variable in the dataset (that is prepared in MPlus format), we use similar code to grab the names of the variables to be used for the analysis, and so in my first attempt or two I've ended up with profiles that include the clustering variable. That's not what I want.

I noticed that we also include an ID variable (h/t @datalorax). However, this variable seems to avoid the fate of the clustering variable (i.e., it's not included in the clustering). I've reviewed the code but am having trouble solving this problem so I thought I would reach out. I think it's something to do in this (excerpted) code from the estimate_profiles_mplus.R file:

d <- select_ancillary_functions_mplus(df, ..., cluster_ID)
 if (is.null(idvar)) {
   id <- data_frame(id = as.numeric(rownames(df)))
   idvar <- "rownum"
 } else {
   if (length(unique(df[[idvar]])) != length(df[[idvar]])) {
     stop("ID variable must be unique")
   }
   if (is.character(df[[idvar]])) {
     string_id <- df[[idvar]]
     num_id <- seq_along(df[[idvar]])
     id <- data_frame(id = num_id)
     names(id) <- idvar
   }
   else {
     id <- data_frame(id = df[[idvar]])
     names(id) <- idvar
   }
 }
 d <- bind_cols(id, d)

 names(d) <- gsub("\\.", "_", names(d))

 x <- write_mplus(d, data_filename)

 var_list <- vector("list", ncol(d))
 for (i in seq_along(names(d))) {
   var_list[[i]] <- names(d)[i]
 }

Any ideas/tips would be much appreciated.

BIC discrepancies

tidyLPA is very useful - many thanks. In my results (I am using the version available on CRAN) there seems to be a discrepancy between the BIC scores obtained from compare_solutions when compared with the BIC obtained with estimate_profiles. Specifically, when looking at the graph produced with compare_solutions, the BIC for "varying variances and covariances fixed to zero" corresponds to the BIC for "equal variances and equal covariances" which is obtained with estimate_profiles, and vice versa. Is there a bug or am I doing something wrong? Many thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.