egouldo / manyecoevo Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 151.11 MB

Software for analysing Many-Analysts' style data and generating the ManyEcoEvo project data

Home Page: https://egouldo.github.io/ManyEcoEvo/

License: GNU General Public License v3.0

R 7.70% TeX 0.65% HTML 91.66%

ecology evolutionary-biology meta-analysis r

manyecoevo's People

Contributors

Stargazers

Watchers

manyecoevo's Issues

Fix pkgdown double logo

Update at pkgdown:: See issue here:

See issue here: r-lib/pkgdown#2184

use_data(upstream targets for manuscript and vignette)

Consider adding upstream or raw files to demonstrate pipeline / package functionality in package vignette and software manuscript. These should be added to tar_make.R.

May want to switch some data objects to internal objects for main manuscript to be called with ManyEcoEvo::: if don't want to expose to user.

Checkout:

Convert descriptive summary table code to functions

Table 3.1

Table A1

Table A2

update citation once manuscript published

See https://stackoverflow.com/questions/72994827/having-two-preferred-citations-when-implementing-a-citation-cff-to-a-project for adding preferred citation to CFF\

note will need two citations, one for main manuscript, one for secondary software manuscript

Move expert_subset creation out of targets into internal pkg data

Implement same approach as ManyEcoEvo:::collinearity_subset

roxygen example code error fit_metafor_mv

checking examples ... ERROR
Running examples in ‘ManyEcoEvo-Ex.R’ failed
The error most likely occurred in:

Name: fit_metafor_mv

Title: Fit Multivariate Metaregression using metafoR

Aliases: fit_metafor_mv

** Examples

#TODO -- is this the best way of setting up this fun?? (i.e. to take numeric vectors)?

Example Usage:

library(tidyverse);library(targets);library(metafor) # NOT RUN, TODO: remove after create pkg

source("R/functions.R") #NOT RUN, TODO: remove after create pkg

tar_read(round_2_survey_meta_analysis) %>%

filter(dataset == "eucalyptus") %>%
filter(!is.na(Zr),
```
     !is.na(VZr),
```
```
     !is.infinite(Zr),
```
```
     !is.infinite(VZr)) %>%
```
fit_metafor_mv(estimate = .$Zr, variance = .$VZr, estimate_type = "Zr", data = .)

── Fitting multivariate metaregression ──

Error in match.arg(method) : object 'VZr' not found
Calls: %>% ... fit_metafor_mv -> -> filter -> match.arg
Execution halted

Package publication: rename fn plot_effects_diversity

This function needs to be renamed since we are actually using Sorensen's similarity index, not diversity. Also the function documentation needs to be renamed.

Collinearity subset analysis does not subset correct list-column of df's

list-col effects_analysis is not being subset, data is. Function is applied after other pre-processing to make ManyEcoEvo::ManyEcoEvo_results. Downstream analyses use effects_analysis as the input list-col of df's, however.

library(ManyEcoEvo)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

library(purrr)

pull_df <- function(x,y){
  x %>% 
    filter(dataset == "blue tit", 
           publishable_subset == "All", 
           expertise_subset == "All", 
           exclusion_set == "complete") %>% 
    pull({{y}})
}

ManyEcoEvo::ManyEcoEvo_results %>% pull_df(data) %>% map(dim)
#> $subset_complete
#> [1] 131  40
#> 
#> $subset_complete
#> [1] 119  40

ManyEcoEvo::ManyEcoEvo_results %>% pull_df(effects_analysis) %>% map(dim)
#> [[1]]
#> [1] 131  48
#> 
#> [[2]]
#> [1] 131  48

^{Created on 2024-06-14 with reprex v2.1.0}

ManyEcoEvo/R/generate_collinearity_subset.R

Lines 53 to 55 in 77c89f6

    
           mutate(data = map(.x = data,  
        
                             .f = dplyr::anti_join, collinearity_subset,   
        
                             by = join_by(response_id, id_col, dataset) )) %>%

functionalise manuscript code in SM3

fit_MA_mv
plot_forest
plot_forest_2 (and what's the difference)?
consider adding create_model_workflow, could wish to leave in manuscript
possibly_check_convergence_glm
plot_model_means_RE
walk_plot_effects_diversity
logged Euc analysis

NAMESPACEs in Imports field not imported

See https://stackoverflow.com/questions/54039992/namespaces-in-imports-field-not-imported-from-all-declared-imports-should-be-us

In R CMD CHECK log:

* checking dependencies in R code ... WARNING
'::' or ':::' imports not declared from:
  ‘broom.mixed’ ‘cli’ ‘data.table’ ‘forcats’ ‘fs’ ‘ggeffects’
  ‘ggforestplot’ ‘glue’ ‘here’ ‘lme4’ ‘magick’ ‘metafor’ ‘metaviz’
  ‘naniar’ ‘parameters’ ‘parsnip’ ‘performance’ ‘pracma’ ‘progress’
  ‘readr’ ‘readxl’ ‘recipes’ ‘sae’ ‘stringr’ ‘timetk’ ‘workflows’
Namespace in Imports field not imported from: ‘tidyselect’
  All declared Imports should be used.

pkgdown - GitHub actions how to load ManyEcoEvo ?

r-dependencies action seems to get around this by having a local:: argument to the yaml parameter extra-packages:... (https://pak.r-lib.org/reference/pak_package_sources.html?q=local#local-packages-local-) but I've switched to renv in the GitHub action workflow... what can we do to get around this?

Do we specify ManyEcoEvo in renv? Seems a bit circular.

Set up binder

https://the-turing-way.netlify.app/communication/binder/zero-to-binder.html

vignette: add small batch of files to inst/extdata/ to demo yi cleaning and validation

duplicated id_col

I found an incorrectly duplicated id_col in master_data.csv for two separate analyses from the same submission for the same team, one blue tit and one eucalyptus. One will need to be recoded in response_id, submission_id and analysis_id and split_id columns.

See details in reprex below:

library(tidyverse)
library(here)
#> here() starts at /Users/elliotgould/Documents/GitHub/ManyAnalysts

library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test

library(ManyEcoEvo)

prepare_df_for_summarising <- function(data){
  data %>% mutate(across(.cols = c(num_fixed_variables,
                                   num_random_variables,
                                   sample_size,
                                   num_interactions,
                                   Bayesian, #NA's coming from CHECK values
                                   mixed_model,
                                   num_fixed_effects,
                                   num_random_effects), 
                         as.numeric),
                  lm = ifelse(linear_model == "linear", 1, 0),
                  glm = ifelse(linear_model == "generalised", 1, 0))
}

Master <- ManyEcoEvo %>% 
  select(data) %>% unnest(everything()) %>% 
  prepare_df_for_summarising() #NAs ok, caused by CHECK vals, not yet using THP's fixes
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `across(...)`.
#> Caused by warning:
#> ! NAs introduced by coercion

Note that we are getting an unexpected many to many relationship here, as per the warning above.

predictions <- read_csv(here::here("ms/predictions_Ids.csv")) %>% #TODO ask HF source
  distinct() %>% 
  left_join(Master, by = c("id_col")) %>% 
  prepare_df_for_summarising()
#> Rows: 258 Columns: 1
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): id_col
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

There are duplicate entries for one id_col, let’s identify these analyses:

predictions %>% janitor::get_dupes("id_col") %>% 
  select(id_col, ends_with("_id"), TeamIdentifier) %>% 
  knitr::kable()

id_col	response_id	submission_id	analysis_id	split_id	TeamIdentifier
Byrock-1-8-1	R_3qfD5ZHHdBbTgk3	1	8	1	Byrock
Byrock-1-8-1	R_3HzSBqQTAmJJ9ye	1	8	1	Byrock

It seems that there are two separate response_id entries for this Team,
However, they are both coded with the same id_col.
let’s see which columns have values that are duplicated:

duplicated_variables <- 
  predictions %>% select(-review_data) %>% 
  janitor::get_dupes("id_col") %>% 
  summarise(id_col = unique(id_col), across(-all_of("id_col"), 
                   ~ first(.x) == last(.x))) %>% 
  select(id_col, where(isFALSE))

predictions %>% 
  semi_join(duplicated_variables, by = join_by("id_col")) %>% 
  select(id_col, colnames(duplicated_variables)) %>% 
  knitr::kable()

id_col	response_id	beta_estimate	adjusted_df	beta_SE	transformation	link_function_reported	dataset	mixed_model	response_variable_name	response_id_S2	sample_size	linear_model	exclusions_all	Conclusion	lm	glm
Byrock-1-8-1	R_3qfD5ZHHdBbTgk3	-0.065490	458.3576	0.014100	identity	identity	blue tit	1	day_14_weight	R_3qfD5ZHHdBbTgk3	3720	linear	exclude	neg_c	1	0
Byrock-1-8-1	R_3HzSBqQTAmJJ9ye	-0.028464	345.0000	0.025721	log	log	eucalyptus	0	euc_sdlgs0_50cm	R_3HzSBqQTAmJJ9ye	350	generalised	retain	neg_q	0	1

OK there is one for both Eucalyptus and for Blue tit, So the split_id is coded incorrectly as these are clearly separate analyses.
I can see that this id is also assigned to different response_id’s, i.e. from different submissions.
I note that in the file prediction_IDs.csv there are three duplicated entries for this id_col.
We should make sure that there isn’t a third analysis somewhere that is also duplicated in id_col.
Would be helpful to know how Hannah created this dataset.
OK, I also note that for response_id R_3HzSBqQTAmJJ9ye There are three entries in
predictions_validations_worksheet.csv belonging to this response_id. So that’s why there are multiple
entries in predictions_IDs.csv.
The submission, analysis and split ID columns in that data file are:

1-8-1
2-9-1
3-10-1
The predictions object here is created also from the Master object or ManyEcoEvo::ManyEcoEvo.
Which comes from the master_data.csv file.
Let’s look at that to see if that’s potentially the source of the problem:

Master %>%  
  filter(TeamIdentifier == "Byrock") %>% 
  select(id_col, dataset, all_of(ends_with("_id"))) %>% 
  distinct() %>% 
  janitor::get_dupes("id_col")
#> # A tibble: 2 × 7
#>   id_col       dupe_count dataset response_id submission_id analysis_id split_id
#>   <chr>             <int> <chr>   <chr>               <dbl>       <dbl>    <dbl>
#> 1 Byrock-1-8-1          2 blue t… R_3qfD5ZHH…             1           8        1
#> 2 Byrock-1-8-1          2 eucaly… R_3HzSBqQT…             1           8        1

Yes, different response_id for the same id_col for analyses of diff. datasets.
Let’s check check the raw data file. Here’s the reprex output I ran over at ManyEcoEvo:
```md *Local.Rprofiledetected at/Users/elliotgould/Documents/GitHub/ManyEcoEvo/.Rprofile`*

library(targets)
library(tidyverse)

There are no extra prediction file submissions for these analyses, so that’s not a problem.

tar_read(list_of_new_prediction_files) %>% 
  filter(response_id == "R_3qfD5ZHHdBbTgk3" | response_id == "R_3HzSBqQTAmJJ9ye") %>% 
  select(dataset, ends_with("_id"), csv_number) 
#> # A tibble: 0 × 6
#> # ℹ 6 variables: dataset <chr>, response_id <chr>, submission_id <dbl>,
#> #   analysis_id <dbl>, split_id <dbl>, csv_number <dbl>

Let’s check the underlying master_data:

readr::read_csv("data-raw/anonymised_data/master_data.csv") %>% 
  filter(TeamIdentifier == "Byrock") %>% 
  select(id_col, dataset, all_of(ends_with("_id"))) %>% 
  knitr::kable()
#> Rows: 302 Columns: 154
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (135): response_id, id_col, contrast, transformation, link_function_repo...
#> dbl  (16): submission_id, analysis_id, split_id, beta_estimate, adjusted_df,...
#> lgl   (3): Extra-pair_dad_ring, rear_Cs_out, rear_Cs_in
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

id_col	dataset	response_id	submission_id	analysis_id	split_id	hatch_nest_breed_ID	rear_nest_breed_ID
Byrock-1-3-1	eucalyptus	R_23UKvhBc7D608VO	1	3	1	NA	NA
Byrock-5-7-1	eucalyptus	R_23UKvhBc7D608VO	5	7	1	NA	NA
Byrock-4-6-1	eucalyptus	R_23UKvhBc7D608VO	4	6	1	NA	NA
Byrock-2-4-1	eucalyptus	R_23UKvhBc7D608VO	2	4	1	NA	NA
Byrock-3-5-1	eucalyptus	R_23UKvhBc7D608VO	3	5	1	NA	NA
Byrock-3-10-1	eucalyptus	R_3HzSBqQTAmJJ9ye	3	10	1	NA	NA
Byrock-1-8-1	eucalyptus	R_3HzSBqQTAmJJ9ye	1	8	1	NA	NA
Byrock-2-9-1	eucalyptus	R_3HzSBqQTAmJJ9ye	2	9	1	NA	NA
Byrock-1-1-1	blue tit	R_3iKJrflQwwxsps0	1	1	1	NA	rear_nest_breed_ID
Byrock-2-2-1	blue tit	R_3iKJrflQwwxsps0	2	2	1	NA	rear_nest_breed_ID
Byrock-1-8-1	blue tit	R_3qfD5ZHHdBbTgk3	1	8	1	NA	rear_nest_breed_ID

Yes, this must be the source of the issue. Two 1-8-1 entries.

^{Created on 2024-06-18 with reprex v2.1.0}

Incorporate feedback into draft manuscript

Reshuffle content order
Add new content where suggested (brief)

Fix incorrectly signed BT betas

Fixed by @parkerth
@egouldo to rerun pipeline
and then update https://github.com/egouldo/ManyAnalysts

Code analyses still coded as 'CHECK'

Hi Hannah or @parkerth,

Could you please check the master_data file (at "data-raw/anonymised_data/master_data.csv") and code as necessary or mark as NA if can't resolve? there are a few analyses remaining with values 'CHECK':

response_id	submission_id	analysis_id	split_id	test_variable	Bayesian	linear_model	model_subclass	exclusions_effect_analysis	Conclusion	data_cleaning_preprocessing_tool	data_cleaning_preprocessing_version	data_analysis_tool	data_analysis_version
R_11787O3NmejXKAH	1	2	2	net_rearing_manipulation	0	generalised	standard	exclude_all	CHECK	R	NA	R	NA
R_11787O3NmejXKAH	1	2	3	net_rearing_manipulation	0	generalised	standard	exclude_all	CHECK	R	NA	R	NA
R_11787O3NmejXKAH	1	2	1	net_rearing_manipulation	0	generalised	standard	exclude_all	CHECK	R	NA	R	NA
R_1eXlFKlQdiD2F59	2	2	1	rear_Cs_at_start_of_rearing	0	linear	standard	retain	CHECK	CHECK	CHECK	CHECK	CHECK
R_1GJlffAgZv6SY4y	1	1	1	NA	CHECK	CHECK	standard	exclude_all	none_c	R	4.0.0	R	4.0.0
R_1M0cMZL2IPYWHoi	1	1	1	NA	CHECK	CHECK	CHECK	exclude_all	CHECK	CHECK	CHECK	CHECK	CHECK
R_1M0cMZL2IPYWHoi	1	1	1	NA	1	generalised	CHECK	exclude_all	NA	R	3.6.3	R	3.6.3
R_1QlnXdW5tKuUQIr	1	1	1	CHECK	CHECK	CHECK	CHECK	CHECK	neg_q	R	3.6.1	R	3.6.1
R_21gmMa0uclrNoTP	2	1	1	NA	CHECK	linear	CHECK	exclude_all	CHECK	CHECK	CHECK	CHECK	CHECK
R_2Pjoz1X4q5XRClO	2	1	1	NA	CHECK	CHECK	standard	exclude_all	CHECK	CHECK	CHECK	CHECK	CHECK
R_2zNKAmJcWbM4QtY	1	1	1	NA	0	CHECK	standard	retain	none_q	R	3.6.1	R	3.6.1
R_3EbbZxcQ3gctVZu	1	1	1	NA	0	CHECK	hurdle	exclude_all	neg_q	R	3.6.1	R	3.6.1
R_3Kvy0h01LXHWniT	2	2	1	NA	0	generalised	standard	retain	CHECK	R	NA	R	NA
R_3nBCE4hMLh5s3qt	3	1	2	NA	CHECK	CHECK	standard	NA	CHECK	CHECK	CHECK	CHECK	CHECK
R_3nBCE4hMLh5s3qt	3	1	1	NA	CHECK	CHECK	standard	NA	CHECK	CHECK	CHECK	CHECK	CHECK
R_AzL6RdNTHtPjxzX	1	1	2	NA	CHECK	CHECK	standard	exclude_all	CHECK	CHECK	CHECK	CHECK	CHECK
R_es2jrrN9CTGwl5D	1	1	1	NA	CHECK	CHECK	CHECK	exclude_all	neg_q	R	4.0.0	R	4.0.0

See: https://github.com/r-lib/devtools/blob/main/vignettes/dependencies.Rmd

remove ManyEcoEvo from pkgs vector
Remove all namespacing in targets code
May also need to remove any namespacing within R/ funs
declare ManyEcoEvo in tar_option_set() imports arg
tar destroy and rerun data-raw.R

	mutate(data = map(.x = data,
	.f = dplyr::anti_join, collinearity_subset,
	by = join_by(response_id, id_col, dataset) )) %>%

egouldo / manyecoevo Goto Github PK

manyecoevo's People

Contributors

Stargazers

Watchers

manyecoevo's Issues

Name: fit_metafor_mv

Title: Fit Multivariate Metaregression using metafoR

Aliases: fit_metafor_mv

** Examples

Example Usage:

library(tidyverse);library(targets);library(metafor) # NOT RUN, TODO: remove after create pkg

source("R/functions.R") #NOT RUN, TODO: remove after create pkg

Recommend Projects

Recommend Topics

Recommend Org