codecheckers / discussion Goto Github PK

0.0 0.0 0.0 1 KB

General discussions and questions

discussion's Introduction

Codecheckers Team

This repo collects all information about people conducting CODECHECKs as part of the CODECHECK community.

Find a codechecker

You can take a look at the codecheckers table, codecheckers.csv to find a suitable codechecker. GitHub even provides a nice search function for the file. Consider skills earlier in the list to be more advanced, later ones to be less strong. If you have a good candidate, please check the codechecker is currently not busy with too many CODECHECKs already (see assigned issues in the CODECHECK register).

Alternatively, you can @-mention the codecheckers team with @codecheckers/codecheckers in the issue for managing the codecheck and ask around for interested codecheckers by adding @codecheckers/codecheckers to an issue comment.

Finally, you can ask the author for recommendations, start an open call for codecheckers on Twitter, et cetera.

Sign up

If you want to get involved as a codechecker, we need to run through the following steps:

Codechecker (you!) opens an issue using this link (with an issue template)
Community manager makes sure all required information is there
Community manager invites the new codechecker to the Codecheckers Team (Note: the team page is not public)
Communtiy manager welcomes the new codechecker
Communtiy manager saves information in the "database" and closes the issue with the commit

About CODECHECK

discussion's People

Contributors

Watchers

discussion's Issues

Bot finding CODECHECK candidates in preprints [incl. automatic detection of auditability]

It would be great to be notified of a preprint that has the potential to be CODECHECKed. Skimming manuscript titles and abstracts, like I regularly do for EarthArXiv, does not really give me that information. A bot/script could download the PDF though and search for keywords around software/code/data. There are a couple of papers who do automatic detection of open data (can dig up) but none that do automatic detection of open methods or auditability as required by CODECHECK. That would be a cool thesis project :-)

If we can easily discover candidates in preprints, we could approach the authors and try to get in the CODECHECK into the review process, growing the number of journals/editors aware of the idea and principles.

Some more ideas:

bot can create issues in a candidates repository, ideally posting DOI and a short excerpt
the detection could have a percentage value for certainty

Is CODECHECK providing safety or security?

See https://www.quora.com/What-is-the-difference-between-safety-and-security

I'd say "safety".

This might be a useful analogy to expand upon in the future.

Short update for all codecheckers and volunteers

👋 Hi @codecheckers/codecheckers !

I just wanted to leave a brief note here, because the CODECHECK project has been rather silent in the last months. I on my end focused on wrapping up my thesis, and since a few months ago I'm on full parental leave and have my hands full day and night 👶 😄.

Nevertheless, @sje30 and I are still excited to move this initiative forward. There are conversations with publishers/journals happening from time to time, @sje30 recently participated in a panel discussion, and we have a growing community of volunteers, so we could really carry quite a workload by now. That is fantastic 🎉

In the meantime, feel free to share the CODECHECK ideas yourself, give talks at your institution, bombard editors at your favorite journal with suggestions to introduce codechecking, give talks about computational reproducibility, or chime in on one of the issues in this discussion forum.

And please do keep the community posted!

Cheers,
Daniel

RaaS as a tool to improve codechecking experiences

Reproducibility as a Service (RaaS) by @jwons could be useful for codecheckers to use, but also for authors to run pre-checks before they submit?

https://doi.org/10.1002/spe.3202

Clarifying Zenodo upload / codecheck directory / codecheck repo

The CODECHECK bundle is described as:

The CODECHECK bundle includes all files that the codechecker used to conduct the CODECHECK. This may include a copy of the author’s files, and any additional files that the codechecker created to assisst them in their codecheck.

I understand that this is not formally specified intentionally, but from the above description I'd expect that the CODECHECK bundle actually refers to the full repository clonded unter the codecheckrs org, since the code and figures provided by the authors are certainly part of the "files that the codechecker used to conduct the CODECHECK". On the other hand, the community guide rather seems to imply that the CODECHECK bundle only refers to things in the codecheck directory. Some Zenodo uploads only contain the codecheck.pdf file, some additionally have some contents of the codecheck directory, but I did not see any that actually contained the output directory let alone all the code/figures provided by the original authors.

I feel that it would make sense to have more consistency in this. Also, choosing some files by hand also in principle makes it possible to upload different versions that are inconsistent with the ones in the repository. In my opinion, it would be best to either only upload the codecheck.pdf file to Zenodo (similar to the ReScience C approach), or use the github Zenodo connection and make a release of the full repository that then gets automatically uploaded to Zenodo (not sure whether this works with a reserved DOI, though).

Badge metadata

This recent NISO report about badges gives some recommendations about what badge metadata should be made available in a structured form. We should revisit if all the information is available via the codecheck.yml, and think about how badge and YAML file could be connected.

https://doi.org/10.3789/niso-rp-31-2021

Also, the idea to add a crpytographic hash in the image header for authentication of the badge source is interesting, but would require an own badge server.

It's a pity that they have ROR-R (reproduced) only as an extension of ROR (reviewed), whereas CODECHECK does reproduction without too much code review.

CODECHECKs of preprints

The community process for codechecking could work very well for preprints.

Papers/preprints citing CODECHECK(s)

Model uncertainty and decision making: Predicting the Impact of COVID-19 Using the CovidSim Epidemiological Code

https://assets.researchsquare.com/files/rs-82122/v1_stamped.pdf

Reproducibility policy Ecological Society of America

Maybe an interesting collaboration partner for CODECHECK:

The Ecological Society of America (9000 members, 6 journals with its publishing partner, Wiley) has just today announced an open research policy for its publications. Here is is the key part of the policy:

"As a condition for publication in ESA journals, all underlying data and statistical code pertinent to the results presented in the publication must be made available in a permanent, publicly accessible data archive or repository, with rare exceptions (see “Details” for more information). Archived data and statistical code should be sufficiently complete to allow replication of tables, graphs, and statistical analyses reported in the original publication, and perform new or meta-analyses. As such, the desire of authors to control additional research with these data and/or code shall not be grounds for withholding material."

Blog post of the announcement: https://www.esa.org/esablog/2021/01/28/esa-data-policy-ensuring-an-openness-to-science/

Details of the policy and resources to assist authors: https://www.esa.org/publications/data-policy/

via @benmarwick 🙇‍♂️

Journal for Reproducibility in Neuroscience

https://twitter.com/jrepneurosci

https://jrn.epistemehealth.com/

Also use OJS > CODECHECK plugin candidate? (see #28)

Report community codechecks to ORCID with a bot

Can we complete rule 3 for community codechecks?

Peer review information can only be provided by "trusted organisations":

That should be achievable and is more a technical problem, AFAICT. This could be the first task or our bot (see #2).

JOSS reports reviews to ORCID, but I didn't find any code that would streamline this in https://github.com/openjournals/joss/search?p=3&q=orcid&unscoped_q=orcid or in https://github.com/openjournals/whedon/search?p=2&q=orcid&unscoped_q=orcid ... maybe I'm looking in the wrong spot.

Develop a Code of Conduct for community

https://ropensci.org/blog/2021/01/07/conduct2021/
anonymous forms for reporting
https://devguide.ropensci.org/policies.html#code-of-conduct
is somethink like https://devguide.ropensci.org/policies.html#ethics-data-privacy-and-human-subjects-research useful?
post first draft here then tag all codecheckers for feedback, ask who would be open to serve as contact points

Extra checks to conduct during a CODECHECK

A CODECHECK is a rare point in the process of academic research when the author is clearly incentivised to do things that are not for their own benefit alone, especially if a CODECHECK is a prerequisite for publication of an article.

What are important tasks that authors would usually avoid/skip/have no time for/might not be aware of? (the latter probably being the most common problem!) Any of these will of course have to be weighted against them being an extra burden to check for the codechecker.

Provide software metadata, cf. https://danielskatzblog.wordpress.com/2020/09/29/software-and-software-metadata/
Consider publishing a software paper (e.g., point them to JOSS)
Proper data citation (e.g., point them to FAIR), though this should be part of an increasing number of editorial processes
... (please add more)

OJS Plugin

Many independent journals use OJS: https://pkp.sfu.ca/ojs/

An OJS plugin for CODECHECK could

set up a section for CODECHECK certificates to be published within OJS
setup up an extra phase for the review/publication/submission process (multiple options?)
set up role of CODECHECKER
explore providing anonymous communication between author and codechecker?
...

Bot to notify about missing Zenodo metadata and subsequent publications

If we codecheck preprints, wouldn't it be great if the CODECHECK bot would subsequently flag the missing related identifiers when the actual paper is reviewed and published?

The same feature could also check that all related identifiers (repository, preprint DOI) are in the Zenodo metadata and automatically contact the Zenodo deposit author if something is missing.

Put technical reproducibility burden completely on authors

It is of course mainly the author's responsibility to come up with a workflow, but the current community process implies that the reviewer could/should invest additional work to make things reproducible, e.g. by making the repository binder-ready, creating a Makefile (see issue #19), etc.
I think it would be better to clearly separate the responsibility: the author provides the workflow (binder, Rmarkdown or jupyter notebook, Makefile, shell script, README file with step by step instructions, ...) and all the CODECHECKER does is following/executing this workflow, verifying that it works and documenting the exact environment that was used. The last part would be something like pip freeze, conda list or R's sessionInfo(), i.e. not a format that allows to reproduce things exactly at the press of a button (you'd need the same platform, etc.), but could give valuable hints if future replications fail or give different results.

Author/editor responsibility for Zenodo upload + codecheck.yml update

This is possibly biased by my ReScience C experience, but wouldn't it make sense to have the final steps that formalize the CODECHECK, i.e. the upload to Zenodo and the update of the codecheck.yml to be rather an editor and not a reviewer task? If I am not mistaken, so far @sje30 and @nuest have been both editors and reviewers at the same time, so this question did not really arise yet. In my opinion, having these final "validation" tasks been done by editors could avoid incorrect (in the sense of missing files, metadata, etc.) code checks being uploaded to Zenodo and in general ensure consistency. I could also imagine that in the future you might decide to e.g. have additional metadata in the Zenodo deposit – it would be easier if you could add them yourself instead of asking reviewers to do it. Of course, ideally you'd have a bot like JOSS's whedon do these tasks for you :)

Downstream applications of the CODECHECK register

a Zotero plugin could show if a CODECHECK for an article in the database exists
publish a link to CODECHECK certificate on SciGen.Report, see codecheckers/register#32
...

CODECHECK overlay journal

https://codecheck.org.uk/journal

Effectively a list (newest on top) of the certificates that looks less tabular than the register with a few more features

search (use some free infrastructure for that?)
volumes (one per year)
pagination
about page

Inspiration: https://rescience.github.io/

Scope of CODECHECK Makefile

The CODECHECK guidelines state:

Write a Makefile to re-run the workflow based on provided documentation, i.e., recreate all files listed in the manifest by runnign the command make codecheck. This target should run the whole or most suitable subset of the workflow and create the report.

But looking at previous CODECHECK examples, this usually does not seem to be the case, the Makefile only recreates the report based on the stored output files (as a side note: make clean still cleans the output files). For example in the last two codechecks: https://github.com/codecheckers/LICD_article and https://github.com/codecheckers/driftage .

As a more general question: what is the advantage of actually using a Makefile instead of e.g. a simple shell script or just a line stating what to do in the README file? Not to be misunderstood: I'm definitely a fan of Makefiles in general, since they are great for keeping track of changed dependencies, etc., but is this actually a CODECHECK use case? This touches on questions of the reproduciblity of the code check report itself. For example, if I run make in the codecheck directory in one of the above mentioned repositories, I will get a new codecheck.pdf which is actually wrong in a sense, because it will dynamically include my R session info, even though this is not the environment that has been used to generate the output files included in the document. I understand that the PDF is the "frozen" document that states everything in a non-ambiguous way, but the whole Makefile + dynamic Rmarkdown setup to me gives the wrong impression that the codecheck itself could be reproduced. This would only be the case if either the Makefile runs the code separately from the Rmarkdown file (which seems to be the case in at least one repository: https://github.com/codecheckers/Hopfield-1982/), or if the Rmarkdown included the actual computations (which is not feasible in general, I guess).

Codecheck progress update

Dear @codecheckers/codecheckers

as you may have seen on Twitter, we have now published a preprint on
F1000R summarising our progress to date with CODECHECK.

https://f1000research.com/articles/10-253/v1

We thank you again for showing an interest in our project. COVID
affected our plans to energise the community, and instead we focused on
checking papers that we (Daniel and I) could mostly do.

Now that our paper is out, the question is - what next for CODECHECK?

We have made a pre-application for funding to the Wellcome Trust, to
help develop the community. If this is not successful, we might try for
other funding sources.

A few journals are beginning to show interest, so if you would like to
still be involved in CODECHECK, and would like to volunteer to check an
article, just let us know by commenting below. We also welcome other
ideas for how we can grow this initiative.

Best wishes,

Daniel @nuest and Stephen @sje30

Dynamic SVG badges for CODECHECK reports

During the review of our codecheck paper, a reviewer commented

The badge that is delivered would need some time information since
the check is valid at one point in time (with a given software stack) and
does not guarantee future runs.

So, what we were thinking was extending the codecheck logo by another line to include either a timestamp, or even just the certificate number (which starts with the year).

I was thinking of having the URL split over two lines, e.g.

https://codecheck.org.uk/
cert/2022-001

(and if so, then we'd need a bit of extra website work to make the URL [https://codecheck.org.uk/cert/2022-001]

@nuest did you make the previous version of the logo? If we added an extra line, could we then use sed to generate new ones?

Alternatively, I see there are now several implementations of 'plain' SVG badges,
https://forthebadge.com/generator/
https://metacpan.org/pod/Badge::Simple

but this style looks a bit too long to me.

Updating reference once a preprint is published?

The covid-uk study (2010-008) has now been published; how do we cite to the paper, as a follow up?

https://doi.org/10.1016/S2468-2667(20)30133-X

Levels of success for CODECHECK certificates

I took a more detailed look at SciGen.Report today together with its founder @amorim-cjs. SciGen.Report has different levels of reproducibility, which might be interesting to capture in the codecheck.yml as proper metadata. The CODECHECK certificates do mentions this kind of information (and the summary in the codecheck.yml might too), but a structured proper field with a set of predefined values could be valuable to intergration and visulisation of search results, for example.

Here is a screenshot of the SciGen.Report options when I want to submit a review and the variants that are relevant to CODECHECK (considering that "negative" CODECHECK certificates are not something that will ever be published in the same way):

Yes, within margin, all of it as written
Yes, within margin, but extra information was needed (please clarify)
Yes, but only partially (please specify)

Can I CODECHECK an article if I am doing a repro study?

Hi all,
I am currently getting involved in a study assessing the computational reproducibility of several studies in psych science. Since the project fits very well with what's been done with CODECHECK (different focus, but most procedures are probably overlapping), shall I try to start my first CODECHECK with some of the articles I am working on already? Or do you think there might be an issue with it (since it wasn't strictly required by the authors)?
Cheers!

How does CODECHECK relate to PubPeer

Should checks be posted to PubPeer automatically?

https://pubpeer.com/static/about

Bot can check for common errors before a codechecker starts

See code published as part of this study: https://www.nature.com/articles/s41597-022-01143-6

Also for R: https://github.com/ropensci-review-tools/pkgcheck

CODECHECK bundles

This is probably most relevant to the "default implementation" of the principles:

How do we define a CODE CHECK bundle? Is there a need for a formal specification, or can we go the REED-way and say: a CODE CHECK bundle is a set of files that is preproducible.
If we define a bundle
- add content to the website how to manually inspect it
- add documentation about opening it with Binder

CODECHECK infrastructure

The assistant is a first step in streamlining code checks. A further step would be online infrastructure that codecheckers can use. Let's note ideas here what this infrastructure could do, what benefits are, what limitations exists, etc.

CODECHECK configuration file - codecheck.yml

Some ideas for future elements

command: "..." # for rare cases where user needs to define this, but normally should be auto-generated from format

checks: # there may be multiple checks?
  - ...
 
codechecker:
  check_runner: "mybinder.org" # who executed the check
  check_signature: "..." # something crpytographic?!

Can CODECHECK be a starting point for curation of reproducible research?

Arguillas, Florio, Christian, Thu-Mai, Gooch, Mandy, Honeyman, Tom, Peer, Limor, & CURE-FAIR WG. (2022). 10 Things for Curating Reproducible and FAIR Research (1.1). https://doi.org/10.15497/RDA00074

10 Things for Curating Reproducible and FAIR Research

Thing 1: Completeness
Thing 2: Organization
Thing 3: Economy
Thing 4: Transparency
Thing 5: Documentation
Thing 6: Access
Thing 7: Provenance
Thing 8: Metadata
Thing 9: Automation
Thing 10: Review

How can CODECHECK procedures, with not too much overhead, positively impact things that help curation of reproducible research?