kwb-r / fakin.doc Goto Github PK

View Code? Open in Web Editor NEW

1.0 4.0 0.0 28.24 MB

Best Practices in Research Data Management

Home Page: https://kwb-r.github.io/fakin.doc

TeX 33.54% CSS 4.80% R 57.20% Shell 2.27% HTML 2.18%

r rstats publication project-fakin research-data-management r-bookdown best-practices

fakin.doc's Introduction

fakin.doc

Documents to be used in our FAKIN project (in German)

fakin.doc's People

Contributors

Stargazers

Watchers

fakin.doc's Issues

ORCID: R package "kwb.orcid" for checking use at KWB

https://github.com/KWB-R/kwb.orcid

Supports: #22

Metadata Requirements in R scripts

Least metadata requirements for R scripts:

Author
Created
Purpose
Input
Output
[Dependencies (sessionInfo())]

Adapt hadley`s R package checklist to KWB needs (packages, workflows, and so on)?

Template from r-lib/usethis#338

Too advanced for us as focuss is on CRAN release (as for ggplot2 3.0 tidyverse/ggplot2#2568) but we can use it as a starting point. For R packages Andi`s "kwb.resilience" could be our first use case

How to create a Github styled TO DO list in rmarkdown (for details see issue #23)

Check: ropensci developer guide (https://ropensci.github.io/dev_guide/)

https://ropensci.github.io/dev_guide/

VCS: Gitlab "Gold" (until 2019-10-30, contact sales -30days for renewal)

https://gitlab.com/kwb-r

Gitlab Gold Features:
https://about.gitlab.com/pricing/gitlab-com/feature-comparison/

Conditions:
https://gitlab.com/gitlab-com/gitlab-oss

pandoc converts not to word (since 7d97f9b)

pandoc: Cannot decode byte '\xd8': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream
Error: pandoc document conversion failed with error 1

More info: https://travis-ci.org/KWB-R/fakin.doc/builds/391017660

Since: 7d97f9b

Add more References

Reminder to myself to add the following references in the relevant chapters of the report, e.g.:

Why to share Code?

Baker, M. Why scientists must share their research code. Nature News http://dx.doi.org/10.1038/nature.2016.20504 (2016).
Barnes, N. Publish your computer code: it is good enough. Nature 467, 753 (2010). http://dx.doi.org/10.1038/467753a

Uncategorized

McKiernan, E. C. et al. How open science helps researchers succeed. eLife 5, e16800 (2016). https://elifesciences.org/articles/16800#
Baker, M. Scientific computing: Code alert. Nature 541, 563–565 (2017). https://dx.doi.org/10.1038/nj7638-563a
Broman, K. Initial steps toward reproducible research. http://kbroman.org/steps2rr/ (2016).
Martinez, C. et al. Reproducibility in Science: A Guide to Enhancing Reproducibility in Scientific Results and Writing https://ropensci.github.io/reproducibility-guide/ (See also ropensci-archive/reproducibility-guide#86)
Michener, W. K. Ten simple rules for creating a good data management plan. PLoS Comput. Biol. 11, e1004525 (2015). https://doi.org/10.1371/journal.pcbi.1004525
Goodman, A. et al. Ten simple rules for the care and feeding of scientific data. PLoS Comput. Biol. 10, e1003542 (2014). https://doi.org/10.1371/journal.pcbi.1003542

Best practices scientific computing

Sandve, G. K., Nekrutenko, A., Taylor, J. & Hovig, E. Ten simple rules for reproducible computational research. PLoS Comput. Biol. 9, e1003285 (2013). https://doi.org/10.1371/journal.pcbi.1003285
Wilson, G. et al. Best practices for scientific computing. PLoS Biol. 12, e1001745 (2014). https://doi.org/10.1371/journal.pbio.1001745

Why GitHub?

Perkel, J. Democratic databases: Science on GitHub. Nature 538, 127–128 (2016).
https://www.nature.com/news/democratic-databases-science-on-github-1.20719

All above references found in:
Lowndes, Julia S Stewart, Benjamin D Best, Courtney Scarborough, Jamie C Afflerbach, Melanie R Frazier, Casey C O’Hara, Ning Jiang, and Benjamin S Halpern. 2017. “Our Path to Better Science in Less Time Using Open Data Science Tools.” Nature Ecology & Evolution 1 (6). Nature Publishing Group
https://www.nature.com/articles/s41559-017-0160

In addition add new R Markdown Book (https://bookdown.org/yihui/rmarkdown/) nd Link to R package rticles, which provides templates for writing journal articles in R markdown.

Practical Data Science for Stats - a PeerJ Collection 2018 https://peerj.com/collections/50-practicaldatascistats/
Broman KW, Woo KH. (2017) Data organization in spreadsheets. PeerJ Preprints 5:e3183v1 https://doi.org/10.7287/peerj.preprints.3183v1
Nüst et al 2018 Reproducible research and GIScience: an evaluation using AGILE conference papers https://peerj.com/articles/5072/
Reproducible research: Strategies, tools, and workflows http://www.helsinki.fi/varieng/series/volumes/19/flanagan/#sect1
Leek and Peng 2015: Opinion: Reproducible research can still be wrong: Adopting a prevention approach https://doi.org/10.1073/pnas.1421412111

Sackler Colloquium on Improving the Reproducibility of Scientific Research (Free Online) http://www.pnas.org/content/115/11 , e.g.:

- An empirical analysis of journal policy effectiveness for computational reproducibility https://doi.org/10.1073/pnas.1708290115
- Scientific progress despite irreproducibility: A seeming paradox https://doi.org/10.1073/pnas.1711786114

Add link to web page https://regex101.com/

This page let's you test out regular expressions. Do we already have a chapter about the importance of regular expressions? We should.
I found this link in the DataCamp Course "Intermediate R - Practice"

R packages: ropensci task view "hydrology

https://github.com/ropensci/hydrology, also lists kwb.hantush :-)

Clean Chapters "Naming" and "Metadata"

How to create a Github styled TO DO list in rmarkdown

According to pandoc tricks, you may first download the task-list.lua file and save it in $DATADIR/pandoc/filters/, so it will be visible to pandoc system-wide, then run pandoc --lua-filter=task-list.lua -o filename.html filename.md

Source: https://stackoverflow.com/questions/28628903/to-do-list-in-rmarkdown

This is linked to issue #18

Roadmap: case study "LCA"

Improved folder structure analysis and documentation using function kwb.fakin::plot_path_tree() to be added in kwb.geosalz documentation
Add feature in R pakcage kwb.umberto for process check KWB-R/kwb.umberto#2
Workflow documentation
Add short documentation in fakin.doc for progress check https://kwb-r.github.io/fakin.doc/case-studies.html#lca-modelling

Add "mybinder" for selected KWB packages (so they can be run in a webbrowser?)

Use case with code: https://github.com/kwb-r/kwb.qmra/binder

Run it:

Note:
My-binder support could be integrated as function in kwb.pkgbuild

Merge chapters "data-workflow" and "folder-structures"

Both chapters are very similar (copied "Best-practices and the second developed by @hsonne) and an should be merged into one:
https://github.com/KWB-R/fakin.doc/blob/master/02_data_storage.Rmd#data-workflow

https://github.com/KWB-R/fakin.doc/blob/master/02_data_storage.Rmd#folder-structures-folder-structures

Acrynoms for "roles": Use Code List of Relators (e.g. "dtm" -> "data manager", "rev" -> Reviewer)

see: http://www.loc.gov/marc/relators/relaterm.html

E.g. in R packages (http://r-pkgs.had.co.nz/description.html#author), but also in general at KWB

Internal KWB Endnote guideline v2 out of date....

needs to be updated with correct server paths!!!!

Add a new chapter "project-phases" (i.e. Projektbausteine)

Fix for local file links working in Firefox (>= v57)

Links to files on intranet are not opened in firefox due to security policies in firefox.

see here for problem description: http://kb.mozillazine.org/Links_to_local_pages_do_not_work

Workaround with firefox addon: LocalFileSystemLinks

Solved with: https://addons.mozilla.org/de/firefox/addon/local-filesystem-links

Could be placed in the FAQ Part of the document

R training: five more Datacamp seats for KWB

To be discussed by department leaders in next management meetings, due to high demand by students and also scientists who are eager to improve their R skills.

ORCID: mandatory for KWB scientists

All researchers at KWB should have an ORCID and add at least their work published at KWB.

On the official KWB website this link should be added for each researcher for finding more info.

In case R packages are published use of ORCID is mandatory and included in KWB style package template (see https://kwb-r.github.io/kwb.pkgbuild/articles/tutorial.HTML)

In fakin.doc: add info table with people at KWB who already have one

Add documentation for test projects (geogenic salination, LCA, AQUANES)

https://github.com/KWB-R/kwb.umberto
https://github.com/KWB-R/aquanes.report
https://github.com/KWB-R/GeoSalz

Integrate "Best-practices" from workshop

Link to "Research Compendium" website

A research compendium accompanies, enhances, or is a scientific publication providing data, code, and documentation for reproducing a scientific workflow. It can be published on different platforms using the label (or tag, community, ...)
research-compendium (applied on GitHub, Zenodo, OSF) or as a fallback the term "research compendium" in the description (used on GitLab). The Zenodo community even has a curation policy for the accepted records.

https://research-compendium.science/

Adapt table of content to Christoph`s suggestions

Tools: R packages for "Metadata"

codemetar (https://ropensci.github.io/codemetar)

"We recommend you to use the codemetar (https://ropensci.github.io/codemetar) package for creating and updating a JSON CodeMeta metadata file (https://codemeta.github.io/) for your package via codemetar::write_codemeta(). It will automatically include all useful information, including GitHub topics. CodeMeta uses schema.org terms so as it gains popularity the JSON metadata of your package might be used by third-party services, maybe even search engines. " (https://ropensci.github.io/dev_guide/building.html#creating-metadata-for-your-package)

dataspice (https://github.com/ropenscilabs/dataspice)

The goal of dataspice is to make it easier for researchers to create basic, lightweight and concise metadata files for their datasets. These basic files can then be used to:

make useful information available during analysis.
create a helpful dataset README webpage.
produce more complex metadata formats to aid dataset discovery.

Metadata fields are based on schema.org and other metadata standards.

R package: dirdf - Extracts Metadata from Directory and File Names https://github.com/ropenscilabs/dirdf
Create tidy data frames of file metadata from directory and file names.

QMS: define a company wide strategy for publishing code

After talking with @daniel-wicke today on publically publishing two R packages used in the project-ogre (see KWB-R/kwb.ogre#2 and KWB-R/kwb.ogre.model#1) it became obvious that we currently are lacking a company wide strategy for publishing code.

For this a workflow should be developed within FAKIN and implemented in the QMS. This for sure requires that the KWB management and the department leaders

I would propose the following:

100% publically sponsored projects (e.g. BMBF, EU, and so on): source code will always be published on https://github.com/kwb-r as public repository (i.e. it will be accessible for everyone) in case it is possible to the code does not contain security critical paths (e.g. to our company server) or confidential data. Code should be developed in such a way that ideally does not include both (security critical paths and confidential data). Making the code openly available will decrease our burden to install them (e.g. not each student needs to get an "access" token to install private repositories, as required for "contract" projects, see below).
Contract projects (BWB, Veolia): will be published as private repositories by default on https://github.com/kwb-r in case that the funder does not pre-define a specific workflow.

Could this topic also be addressed within one of the next management meetings @chsprenger ?

Merge content in "rawdata" chapter

The content here is copied from "Best-practices workshop and the second developed by @hsonne).

This should be wrapped up into one!

UFOPLAN BaSaR: data workflow/structure recommendation

Within the new KWB project UFOPLAN BaSaR, R scripts already used in OgRe should be reused.

As the project is at the beginning with regular sampling starting next week it is a good time for optimising the data workflow according to Daniel.

He hopes to get some recommendations on how to improve the current folder structure in order to make it easy in the future for being integrated in the workflow proposed by FAKIN.

However, it needs to be assured that R scripts work also on field laptops without connection to the KWB intranet (i.e. adapting folder paths with minimal effort)

Check R package "checkers" (automated checking of best practices for research compendia)

https://github.com/ropenscilabs/checkers

Add documentation for monitoring projects (MIA-CSO, Ogre, OPTWELLS)

kwb.logger (OPTIWELLS, MIA-CSO, Ogre)
kwb.monitoring (Ogre, Flusshygiene)

Code Ocean for running code (alternative to mybinder)

www.codeocean.com

HI Michael, we've upgraded your quota for the time being to 10 hours/50 GB and we'll reset the official plan designation later today. But you have all of the privileges of the pro account as of now and anyone else who signs up via the www.kompetenz-wasser.de domain should be automatically upgraded as well.

Best,

Seth Green
Developer Advocate

From FAQ: https://codeocean.com/plans

What is included in the Researcher plan?
The researcher plan includes everything you need to get started, explore and run code, download code and data, unlimited compute capsule publishing, privately modify published code, collaborate with peers, embed code onto your personal site. Everyone is allotted 5GB of storage and 1 hour of compute time per month. Use your academic email to get 20GB of storage and 10 hours of compute time per month.

Another example is to enable comments or discussions on your HTML pages. There are several possibilities, such as Disqus (https://disqus.com) or Hypothesis (https://hypothes.is). These services can be easily embedded in your HTML book via the includes option (see Section 5.5 for details).
--- Source: bookdown.org

Good example for integration can be found here:
https://benmarwick.github.io/bookdown-ort/mods.html

Batch Script For File Tracking

Develop a batch script that tracks all file/folder changes on the KWB servers

/projekte$
/processing
/rawdata

To be used for Brownbag and later as general tool to identify when (un)intended changes in folder structure occured.

no "index.html" is created

I don't know why, but no "index.html" gets currently created anymore, thus http://kwb-r.github.io/fakin.doc/ displays the document not by default.

However, this works for a similar setting, e.g. here:
https://github.com/rstudio/bookdown/tree/master/inst/examples

Roadmap: case study "geosalz"

Improved folder structure analysis and documentation using function kwb.fakin::plot_path_tree() (to be added in kwb.geosalz documentation
Workflow documentation for process check https://kwb-r.github.io/kwb.geosalz/dev/articles/workfow.html
Add short documentation in fakin.doc for progress check https://kwb-r.github.io/fakin.doc/case-studies.html#geogenic-salination