bergelsonlab / blabr Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
This part of devtools::check()
output is really useful to flag unqualified function calls. However, as it currently outputs a list of 100+ names, it isn't practical to use it if you only care about the new code. It would be great if this list was cut down considerably.
codetools::checkUsagePackage('blabr')
to list the usages with the file paths and line numbers..data
"pronoun" should do it.There are two steps in all get_*
functions:
readr
and column specifications.The two steps don't need to be coupled and having them apart will allow more flexibility. For example, data files cloned or created in a non-standard location could still be read.
R/test-helpers.R
to testthat/helpers.R
so that packages only needed for testing aren't referred to in the package's code. I put it under R/
because testthat at some point said that using testthat/helpers*.R
was no longer recommended (indirect evidence here). Apparently, referring to functions from packages that aren't your package's "Import" dependencies isn't a problem, that's what "Suggests" are for. Nevertheless, to me, it makes so much more sense to have testing code separate from the package's code.library
calls from all test files, switch to qualified function calls. I don't know why but at some point I decided it was ok to put library()
calls inside tests. As the result, tests will pass even if there is an unqualified call to, say, mutate, in the package's main code that is being tested. I didn't know why I expected it to work otherwise but I am not alone - see this SO questionsThe following pattern is used too many times not to be factored out to a function:
library(digest)
hashes_list <- speaker_stats %>%
summarise(across(everything(), digest)) %>%
as.list
expected_hashes_list <-
list(
interval_start = "75ff43e40a186ae138dc9b709b691a45",
interval_end = "5e39906727aa950a55bff1f80d4226bb",
spkr = "8b19ab3ad09943f2c807002c40ebe943",
adult_word_count = "a3dd76d9042133d4ee0a6ccbc654ba48",
utterance_count = "d36f09f3bbada305d0925623a9ffb990",
segment_duration = "178fa344206b188de05bea4f07fe2b50"
)
expect_equal(hashes_list, expected_hashes_list)
I think a good place for this function is R/test-helpers.R
but check r-pkgs
first.
add_lena_stats
referred to a GIN file on my computer - not the best idea. I left comments with instructions what to do in the code of the test.Multiple sources state that library()
calls should be avoided. It is ok to have them in the test files though.
From R Packages:
You should never use require() or library() in a package: instead, use the Depends or Imports fields in the DESCRIPTION.
See also this SO question.
blabr
used to export every variable defined in its code. For various reasons, it is not a good idea so I switched to explicitly exporting functions/variables that users might use. If you encounter a function that is not exported, please add it in a comment below.
In the meantime, use blabr:::<function>
to access a that used to be available after library(blabr)
and now isn't.
Right now the code is written in that function such that it's looking for the target onset to be present in the dataframe. For the studies that don't involve incorporating the message report, this isn't in there. The current work-around is to just set it as a value in the global environment, but that's fairly delicate and should be replaced with an argument inside the function that can be optionally set to a single value.
Probably this is enough:
writeLines(capture.output(sessionInfo()), "session-info_{username}.txt")
Code above (except for the filename) from here
Right now, everything that starts with a letter gets exported + magrittr's pipe. Maybe that is the right way to go, I do not know enough about r packages to tell. But it does seem strange to not be explicit about what names we expose.
With roxygen, functions get exported if they have the @export
keyword before them.
Do not forget to expose margittr's pipe in case of deciding to switch.
Maybe ignore those rows. Anything but the current behavior of selecting the NA rows.
The specific join (interval-on-interval) requires a BioConductor package IRanges which requires the user to install from source on M1s. This is annoying and unnecessary: with the size of the table being joined cross-join+filter will be just fine.
Remove from DESCRIPTION:
Remove from lena.r:
The %+replace% is supposed to clear all elements from the previous theme, making calling theme_bw(...)
have no effect.
theme_bw
settings are left in the themes,Here is an error I want to see less of. One context in which this error comes up often is running test-get-vihi-annotations while connected to VPN so this might be a way to test. Another option is to set the timeout to some ridiculously small amount of time.
Error in
run_git_command(repo, "fetch --tags --prune --prune-tags")
: Error executing git command:fetch --tags --prune --prune-tags
Error message:
fatal: unable to access 'https://github.com/bergelsonlab/vihi_annotations.git/': Failed to connect to github.com port 443 after 21057 ms: Couldn't connect to server
Mimimally. combine seedlings.R
and rttmR
into io.R
.
Right now, the way the function works emulates the way LENA creates intervals for lena5min.csv's. First and last intervals start and end outside of the recordings: 15:42:15:52 -> (15:40-15:45, 15:45-15:50, 15:50-15:55). They get trimmed afterward but they are still there while for ACLEW we don't want those short intervals at all. Keeping those intervals should be an option. Also, for ACLEW we'll need to introduce buffers to accommodate context regions.
See vihi-sampling code where intervals are created for the seedlings corpus.
round(age/30.435)
Note: fixations_to_timepoints
isn't yet implemented at all.
t_series <- fixations %>%
summarise(t_min = min(current_fix_start),
t_max = max(current_fix_end)) %>%
mutate(across(c(t_min, t_max),
~ floor(. / bin_size) * bin_size)) %>%
mutate(t = list(seq(t_min, t_max, by = bin_size))) %>%
select(t) %>%
unnest(cols = t)
t_series %>%
inner_join(
fixations %>%
mutate(across(c(current_fix_start, current_fix_end),
~ floor(. / bin_size) * bin_size)),
by = join_by(between(t, current_fix_start, current_fix_end))
)
The main part (speeding up by switching to a join) was done in a632aa0.
If someone has a BLAB_DATA repository set to use ssh and their key is no longer valid, they'll get a very cryptic error:
The access error is also there but it doesn't cause an actual R error and instead causes a weird strsplit
error.
How it should work:
blabr
looks for, e.g., ~/BLAB_DATA/cdi_spreadsheet
.
Due to peculiarities of its files (or my code, not sure), there are extra NA.
See aclew.R in ACLEW_correllations in vihi_sample in one_time_scripts to see what actually happens there.
add_its_stats(add_stats_function = get_lena_speaker_stats) %>%
# Sometimes some numeric columns in its files (as read by rlena) are
# inexplicably NA instead of zero ¯\_(ツ)_/¯. We'd rather have zeros there
# unless it is a fully NA row signifying an interval with no LENA segments.
mutate(across(c(adult_word_count, utterance_count, segment_duration),
~ if_else(is.na(spkr), true = .x,
false = replace_na(.x, 0)))) %>%
In calculate_lena_like_stats
, first, create a tibble with the intervals and then apply add_lena_stats
to it. The interval creation should output both the wav and wall time.
The test uses 01_12_audio_sparse_code.csv
which has changed - update the test.
Often we run eye-tracking studies examining infant word comprehension. It would be useful to cross-check our collected data against parent report with a function that works as follows:
For each item in an eye-tracking study, this function would provide each child's CDI value (understands, produces, neither, not on CDI) for each item.
proposed output:
subj | target | CDI |
---|---|---|
s01 | apple | produces |
s01 | shang | NA |
s02 | banana | produces |
s02 | train | understands |
Alternatively, remove the @examples
keyword.
From document()
:
Warning: [/Users/ek221/blab/blabr/blabr/R/get_data.R:53] @examples requires a value
Warning: [/Users/ek221/blab/blabr/blabr/R/read_bl.R:38] @examples requires a value
Warning: [/Users/ek221/blab/blabr/blabr/R/read_bl.R:193] @examples requires a value
Installing blabr
without first installing tidyverse
results in an error. Tidyverse is a meta-package, so this could probably be resolved by loading individual packages instead of loading the whole tidyverse. Otherwise, we could just add tidyverse as a dependency.
Right now, if you request a dev version, it will look for the most current version in the dev repository only. It should look in both repos and print either only the public version if it is the most current overall, or both.
Add authors and contributors, and assign yourself as "creator" which actually means "current maintainer".
Here is the relevant output of check()
:
> checking for missing documentation entries ... WARNING
Undocumented code objects:
‘FindFrozenTrials’ ‘FindLowData’ ‘RemoveFrozenTrials’ ‘RemoveLowData’
‘add_chi_noun_onset’ ‘all_basiclevel’ ‘all_errors’ ‘anonymous’
‘audio_cnames’ ‘binifyFixations’ ‘bl_types’ ‘blab_data’
‘cdi_get_words’ ‘cdi_words’ ‘characters_to_factors’
‘check_annot_codes’ ‘checkout_branch’ ‘checkout_commit’
‘chi_noun_onset’ ‘count_chi’ ‘count_chi_types’ ‘count_device_and_toy’
‘count_mot_fat’ ‘count_object_present’ ‘count_utterance’
‘expandFixList’ ‘fixations_report’ ‘get_df_file’
‘get_late_target_onset’ ‘get_mesrep’ ‘get_pairs’ ‘get_vocab_score’
‘get_windows’ ‘git_bin’ ‘home_dir’ ‘keypress_issues’
‘keypress_retrieved’ ‘late_target_retrieved’ ‘load_tsv’
‘malformed_speaker_codes’ ‘obj_pres’ ‘object2string’ ‘on_cdi’
‘outlier’ ‘reliability’ ‘rename_audio_header’ ‘rename_video_header’
‘string2object’ ‘subj_mos’ ‘subj_nums’ ‘subjectList’ ‘sync_repo’
‘sync_to_upstream’ ‘theme_AMERICA’ ‘theme_blab’ ‘theme_spooky’
‘utt_type’ ‘video_cnames’
All user-level objects in a package should have documentation entries.
See chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual.
We should either remove this dependency or figure out if the archived version works
threw a bunch of warnings about invalid factor levels, should probably convert things to character somewhere higher up (and then back to factor if needed)--haven't checked all the NA cases.
actual error message below
Adding missing grouping variables: audio_video
Adding missing grouping variables: SubjectNumber
Joining, by = c("subj", "month", "audio_video")
Joining, by = c("subj", "month", "audio_video")
Joining, by = c("subj", "month", "audio_video")
Joining, by = c("subj", "month", "audio_video")
Joining, by = c("subj", "month", "audio_video")
Joining, by = c("subj", "month", "audio_video")
Joining, by = c("subj", "month", "audio_video")
Joining, by = c("subj", "month", "audio_video")
mutate_each()
is deprecated.
Use mutate_all()
, mutate_at()
or mutate_if()
instead.
To map funs
over all variables, use mutate_all()
Error in mutate_impl(.data, dots) :
Evaluation error: object 'TVS' not found.
assertthat::not_empty(tbl)
is incorrect! It only returns a boolean result, no assertion is made. The correct version is
asserthat::assert_that(assertthat::not_empty(tbl))
when you do devtools::install_github('BergelsonLab/blabr') it would be nice if it checked what libraries were already installed instead of reinstalling e.g. tidyverse, etc.
From check()
:
> checking for code/documentation mismatches ... WARNING
Codoc mismatches from documentation object 'big_aggregate':
big_aggregate
Code: function(x, exclude = NULL, output = NULL, exclude_chi = FALSE)
Docs: function(x, exclude = NULL, output = NULL)
Argument names in code not in docs:
exclude_chi
> checking Rd \usage sections ... WARNING
Undocumented arguments in documentation object 'join_full_audio_video'
‘output_name’ ‘keep_na’ ‘keep_comments’
Documented arguments not in \usage in documentation object 'join_full_audio_video':
‘output’
Functions with \usage entries need to have the appropriate \alias
entries, and all their arguments documented.
The \usage entries must correspond to syntactically valid R code.
See chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual.
I can't even get the input md5sum to match. Not sure what this is about.
They need version-specific col_types. The most current version won't load at all because we force col_types to match the data now. Once done, uncomment test_that("same results after loading from csv and feather") in test-get_data.R.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.