Giter Site home page Giter Site logo

egouldo / manyecoevo Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 151.11 MB

Software for analysing Many-Analysts' style data and generating the ManyEcoEvo project data

Home Page: https://egouldo.github.io/ManyEcoEvo/

License: GNU General Public License v3.0

R 7.70% TeX 0.65% HTML 91.66%
ecology evolutionary-biology meta-analysis r

manyecoevo's People

Contributors

egouldo avatar parkerth avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

manyecoevo's Issues

roxygen example code error fit_metafor_mv

  • checking examples ... ERROR
    Running examples in ‘ManyEcoEvo-Ex.R’ failed
    The error most likely occurred in:

Name: fit_metafor_mv

Title: Fit Multivariate Metaregression using metafoR

Aliases: fit_metafor_mv

** Examples

#TODO -- is this the best way of setting up this fun?? (i.e. to take numeric vectors)?

Example Usage:

library(tidyverse);library(targets);library(metafor) # NOT RUN, TODO: remove after create pkg

source("R/functions.R") #NOT RUN, TODO: remove after create pkg

tar_read(round_2_survey_meta_analysis) %>%

  • filter(dataset == "eucalyptus") %>%
  • filter(!is.na(Zr),
  •      !is.na(VZr),
    
  •      !is.infinite(Zr),
    
  •      !is.infinite(VZr)) %>%
    
  • fit_metafor_mv(estimate = .$Zr, variance = .$VZr, estimate_type = "Zr", data = .)

── Fitting multivariate metaregression ──

Error in match.arg(method) : object 'VZr' not found
Calls: %>% ... fit_metafor_mv -> -> filter -> match.arg
Execution halted

Collinearity subset analysis does not subset correct list-column of df's

list-col effects_analysis is not being subset, data is. Function is applied after other pre-processing to make ManyEcoEvo::ManyEcoEvo_results. Downstream analyses use effects_analysis as the input list-col of df's, however.

library(ManyEcoEvo)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)

pull_df <- function(x,y){
  x %>% 
    filter(dataset == "blue tit", 
           publishable_subset == "All", 
           expertise_subset == "All", 
           exclusion_set == "complete") %>% 
    pull({{y}})
}

ManyEcoEvo::ManyEcoEvo_results %>% pull_df(data) %>% map(dim)
#> $subset_complete
#> [1] 131  40
#> 
#> $subset_complete
#> [1] 119  40
ManyEcoEvo::ManyEcoEvo_results %>% pull_df(effects_analysis) %>% map(dim)
#> [[1]]
#> [1] 131  48
#> 
#> [[2]]
#> [1] 131  48

Created on 2024-06-14 with reprex v2.1.0

mutate(data = map(.x = data,
.f = dplyr::anti_join, collinearity_subset,
by = join_by(response_id, id_col, dataset) )) %>%

functionalise manuscript code in SM3

  • fit_MA_mv
  • plot_forest
  • plot_forest_2 (and what's the difference)?
  • consider adding create_model_workflow, could wish to leave in manuscript
  • possibly_check_convergence_glm
  • plot_model_means_RE
  • walk_plot_effects_diversity
  • logged Euc analysis

NAMESPACEs in Imports field not imported

See https://stackoverflow.com/questions/54039992/namespaces-in-imports-field-not-imported-from-all-declared-imports-should-be-us

In R CMD CHECK log:

* checking dependencies in R code ... WARNING
'::' or ':::' imports not declared from:
  ‘broom.mixed’ ‘cli’ ‘data.table’ ‘forcats’ ‘fs’ ‘ggeffects’
  ‘ggforestplot’ ‘glue’ ‘here’ ‘lme4’ ‘magick’ ‘metafor’ ‘metaviz’
  ‘naniar’ ‘parameters’ ‘parsnip’ ‘performance’ ‘pracma’ ‘progress’
  ‘readr’ ‘readxl’ ‘recipes’ ‘sae’ ‘stringr’ ‘timetk’ ‘workflows’
Namespace in Imports field not imported from: ‘tidyselect’
  All declared Imports should be used.

duplicated id_col

I found an incorrectly duplicated id_col in master_data.csv for two separate analyses from the same submission for the same team, one blue tit and one eucalyptus. One will need to be recoded in response_id, submission_id and analysis_id and split_id columns.

See details in reprex below:

library(tidyverse)
library(here)
#> here() starts at /Users/elliotgould/Documents/GitHub/ManyAnalysts
library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test
library(ManyEcoEvo)

prepare_df_for_summarising <- function(data){
  data %>% mutate(across(.cols = c(num_fixed_variables,
                                   num_random_variables,
                                   sample_size,
                                   num_interactions,
                                   Bayesian, #NA's coming from CHECK values
                                   mixed_model,
                                   num_fixed_effects,
                                   num_random_effects), 
                         as.numeric),
                  lm = ifelse(linear_model == "linear", 1, 0),
                  glm = ifelse(linear_model == "generalised", 1, 0))
}

Master <- ManyEcoEvo %>% 
  select(data) %>% unnest(everything()) %>% 
  prepare_df_for_summarising() #NAs ok, caused by CHECK vals, not yet using THP's fixes
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `across(...)`.
#> Caused by warning:
#> ! NAs introduced by coercion

Note that we are getting an unexpected many to many relationship here, as per the warning above.

predictions <- read_csv(here::here("ms/predictions_Ids.csv")) %>% #TODO ask HF source
  distinct() %>% 
  left_join(Master, by = c("id_col")) %>% 
  prepare_df_for_summarising()
#> Rows: 258 Columns: 1
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): id_col
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

There are duplicate entries for one id_col, let’s identify these analyses:

predictions %>% janitor::get_dupes("id_col") %>% 
  select(id_col, ends_with("_id"), TeamIdentifier) %>% 
  knitr::kable()
id_col response_id submission_id analysis_id split_id TeamIdentifier
Byrock-1-8-1 R_3qfD5ZHHdBbTgk3 1 8 1 Byrock
Byrock-1-8-1 R_3HzSBqQTAmJJ9ye 1 8 1 Byrock

It seems that there are two separate response_id entries for this Team,
However, they are both coded with the same id_col.
let’s see which columns have values that are duplicated:

duplicated_variables <- 
  predictions %>% select(-review_data) %>% 
  janitor::get_dupes("id_col") %>% 
  summarise(id_col = unique(id_col), across(-all_of("id_col"), 
                   ~ first(.x) == last(.x))) %>% 
  select(id_col, where(isFALSE))

predictions %>% 
  semi_join(duplicated_variables, by = join_by("id_col")) %>% 
  select(id_col, colnames(duplicated_variables)) %>% 
  knitr::kable()
id_col response_id beta_estimate adjusted_df beta_SE transformation link_function_reported dataset mixed_model response_variable_name response_id_S2 sample_size linear_model exclusions_all Conclusion lm glm
Byrock-1-8-1 R_3qfD5ZHHdBbTgk3 -0.065490 458.3576 0.014100 identity identity blue tit 1 day_14_weight R_3qfD5ZHHdBbTgk3 3720 linear exclude neg_c 1 0
Byrock-1-8-1 R_3HzSBqQTAmJJ9ye -0.028464 345.0000 0.025721 log log eucalyptus 0 euc_sdlgs0_50cm R_3HzSBqQTAmJJ9ye 350 generalised retain neg_q 0 1

OK there is one for both Eucalyptus and for Blue tit, So the split_id is coded incorrectly as these are clearly separate analyses.
I can see that this id is also assigned to different response_id’s, i.e. from different submissions.
I note that in the file prediction_IDs.csv there are three duplicated entries for this id_col.
We should make sure that there isn’t a third analysis somewhere that is also duplicated in id_col.
Would be helpful to know how Hannah created this dataset.
OK, I also note that for response_id R_3HzSBqQTAmJJ9ye There are three entries in
predictions_validations_worksheet.csv belonging to this response_id. So that’s why there are multiple
entries in predictions_IDs.csv.
The submission, analysis and split ID columns in that data file are:

  • 1-8-1
  • 2-9-1
  • 3-10-1
    The predictions object here is created also from the Master object or ManyEcoEvo::ManyEcoEvo.
    Which comes from the master_data.csv file.
    Let’s look at that to see if that’s potentially the source of the problem:
Master %>%  
  filter(TeamIdentifier == "Byrock") %>% 
  select(id_col, dataset, all_of(ends_with("_id"))) %>% 
  distinct() %>% 
  janitor::get_dupes("id_col")
#> # A tibble: 2 × 7
#>   id_col       dupe_count dataset response_id submission_id analysis_id split_id
#>   <chr>             <int> <chr>   <chr>               <dbl>       <dbl>    <dbl>
#> 1 Byrock-1-8-1          2 blue t… R_3qfD5ZHH…             1           8        1
#> 2 Byrock-1-8-1          2 eucaly… R_3HzSBqQT…             1           8        1

Yes, different response_id for the same id_col for analyses of diff. datasets.
Let’s check check the raw data file. Here’s the reprex output I ran over at ManyEcoEvo:
```md *Local.Rprofiledetected at/Users/elliotgould/Documents/GitHub/ManyEcoEvo/.Rprofile`*

library(targets)
library(tidyverse)

There are no extra prediction file submissions for these analyses, so that’s not a problem.

tar_read(list_of_new_prediction_files) %>% 
  filter(response_id == "R_3qfD5ZHHdBbTgk3" | response_id == "R_3HzSBqQTAmJJ9ye") %>% 
  select(dataset, ends_with("_id"), csv_number) 
#> # A tibble: 0 × 6
#> # ℹ 6 variables: dataset <chr>, response_id <chr>, submission_id <dbl>,
#> #   analysis_id <dbl>, split_id <dbl>, csv_number <dbl>

Let’s check the underlying master_data:

readr::read_csv("data-raw/anonymised_data/master_data.csv") %>% 
  filter(TeamIdentifier == "Byrock") %>% 
  select(id_col, dataset, all_of(ends_with("_id"))) %>% 
  knitr::kable()
#> Rows: 302 Columns: 154
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (135): response_id, id_col, contrast, transformation, link_function_repo...
#> dbl  (16): submission_id, analysis_id, split_id, beta_estimate, adjusted_df,...
#> lgl   (3): Extra-pair_dad_ring, rear_Cs_out, rear_Cs_in
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
id_col dataset response_id submission_id analysis_id split_id hatch_nest_breed_ID rear_nest_breed_ID
Byrock-1-3-1 eucalyptus R_23UKvhBc7D608VO 1 3 1 NA NA
Byrock-5-7-1 eucalyptus R_23UKvhBc7D608VO 5 7 1 NA NA
Byrock-4-6-1 eucalyptus R_23UKvhBc7D608VO 4 6 1 NA NA
Byrock-2-4-1 eucalyptus R_23UKvhBc7D608VO 2 4 1 NA NA
Byrock-3-5-1 eucalyptus R_23UKvhBc7D608VO 3 5 1 NA NA
Byrock-3-10-1 eucalyptus R_3HzSBqQTAmJJ9ye 3 10 1 NA NA
Byrock-1-8-1 eucalyptus R_3HzSBqQTAmJJ9ye 1 8 1 NA NA
Byrock-2-9-1 eucalyptus R_3HzSBqQTAmJJ9ye 2 9 1 NA NA
Byrock-1-1-1 blue tit R_3iKJrflQwwxsps0 1 1 1 NA rear_nest_breed_ID
Byrock-2-2-1 blue tit R_3iKJrflQwwxsps0 2 2 1 NA rear_nest_breed_ID
Byrock-1-8-1 blue tit R_3qfD5ZHHdBbTgk3 1 8 1 NA rear_nest_breed_ID

Yes, this must be the source of the issue. Two 1-8-1 entries.

Created on 2024-06-18 with reprex v2.1.0


Created on 2024-06-18 with reprex v2.1.0

Code analyses still coded as 'CHECK'

Hi Hannah or @parkerth,

Could you please check the master_data file (at "data-raw/anonymised_data/master_data.csv") and code as necessary or mark as NA if can't resolve? there are a few analyses remaining with values 'CHECK':

response_id submission_id analysis_id split_id test_variable Bayesian linear_model model_subclass exclusions_effect_analysis Conclusion data_cleaning_preprocessing_tool data_cleaning_preprocessing_version data_analysis_tool data_analysis_version
R_11787O3NmejXKAH 1 2 2 net_rearing_manipulation 0 generalised standard exclude_all CHECK R NA R NA
R_11787O3NmejXKAH 1 2 3 net_rearing_manipulation 0 generalised standard exclude_all CHECK R NA R NA
R_11787O3NmejXKAH 1 2 1 net_rearing_manipulation 0 generalised standard exclude_all CHECK R NA R NA
R_1eXlFKlQdiD2F59 2 2 1 rear_Cs_at_start_of_rearing 0 linear standard retain CHECK CHECK CHECK CHECK CHECK
R_1GJlffAgZv6SY4y 1 1 1 NA CHECK CHECK standard exclude_all none_c R 4.0.0 R 4.0.0
R_1M0cMZL2IPYWHoi 1 1 1 NA CHECK CHECK CHECK exclude_all CHECK CHECK CHECK CHECK CHECK
R_1M0cMZL2IPYWHoi 1 1 1 NA 1 generalised CHECK exclude_all NA R 3.6.3 R 3.6.3
R_1QlnXdW5tKuUQIr 1 1 1 CHECK CHECK CHECK CHECK CHECK neg_q R 3.6.1 R 3.6.1
R_21gmMa0uclrNoTP 2 1 1 NA CHECK linear CHECK exclude_all CHECK CHECK CHECK CHECK CHECK
R_2Pjoz1X4q5XRClO 2 1 1 NA CHECK CHECK standard exclude_all CHECK CHECK CHECK CHECK CHECK
R_2zNKAmJcWbM4QtY 1 1 1 NA 0 CHECK standard retain none_q R 3.6.1 R 3.6.1
R_3EbbZxcQ3gctVZu 1 1 1 NA 0 CHECK hurdle exclude_all neg_q R 3.6.1 R 3.6.1
R_3Kvy0h01LXHWniT 2 2 1 NA 0 generalised standard retain CHECK R NA R NA
R_3nBCE4hMLh5s3qt 3 1 2 NA CHECK CHECK standard NA CHECK CHECK CHECK CHECK CHECK
R_3nBCE4hMLh5s3qt 3 1 1 NA CHECK CHECK standard NA CHECK CHECK CHECK CHECK CHECK
R_AzL6RdNTHtPjxzX 1 1 2 NA CHECK CHECK standard exclude_all CHECK CHECK CHECK CHECK CHECK
R_es2jrrN9CTGwl5D 1 1 1 NA CHECK CHECK CHECK exclude_all neg_q R 4.0.0 R 4.0.0

Ensure proper use of targets within pkg

  • remove ManyEcoEvo from pkgs vector
  • Remove all namespacing in targets code
  • May also need to remove any namespacing within R/ funs
  • declare ManyEcoEvo in tar_option_set() imports arg
  • tar destroy and rerun data-raw.R

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.