datalad / datalad-paper-joss Goto Github PK
View Code? Open in Web Editor NEWRepository for JOSS paper on DataLad
License: MIT License
Repository for JOSS paper on DataLad
License: MIT License
The structure is mandated:
https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain
I think we should follow this (in particular the order, which is presently not the case).
I would also propose the following:
Note mandated, but ": " is a common title pattern in the journal.
clarification: 👎 is the refinement of 🎉, and votes for 🎉 will be added to 👎 (unless double-voted).
I encourage those who voted for 🎉 revote for 👎 if they agree, and if you don't - please comment to support your choice of 🎉 over 👎
ATM it states the license, where to find 3rd-party terms, and that there is a CONTRIBUTION file. The first two aspects are only vaguely related to "contributions", the latter is merely a reference.
Shouldn't we rather say:
The present state is approach 2x the suggested maximum length (250-1000 words).
Is this a concern?
Development started 8 years ago
commit 8a6d1c60e7fdf41943360f5ae0c8df0ce682c677
Author: Yaroslav Halchenko <[email protected]>
Date: Tue May 21 16:50:21 2013 -0400
original pieces for gitweb
We had ~13500 commits in master since. 2500 finished PRs. 700 open and 2300 closed issues.
At the moment there is just one:
datalad search haxby
While it is iconic, it is not connected to the presented added value provided by DataLad (neither in their current form, nor those proposed in #64).
If we keep search
as the demo, I propose to switch to another search term that is self-explanatory. I tested a few cases. I would be fine with any of them:
movie fmri
one-back task
diffusion mri
openneuro
emotion face
However, I think a better use case would be datalad run
-- as it is (or could be) immediately connected with the presented key features.
I don't know details of financing/accounting, so I'm not sure of if they've directly funded development, but I think I've been to events/hackathons/etc where development was done on DataLad funded by:
Though, to be honest, the boundaries of what's funding for DataLad or git-annex
or just developers funded from somewhere attending a hackathon and what's significant enough to appear in the funding statement is a little blurry to me.
sloccount
on master
Totals grouped by language (dominant language first):
python: 63823 (70.02%)
javascript: 25500 (27.98%)
sh: 1826 (2.00%)
Total Physical Source Lines of Code (SLOC) = 91,149
Development Effort Estimate, Person-Years (Person-Months) = 22.84 (274.13)
(Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months) = 1.76 (21.10)
(Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule) = 12.99
Total Estimated Cost to Develop = $ 3,085,895
(average salary = $56,286/year, overhead = 2.40).
we should ignore the non-Python lines, which leaves: ~64k lines
This section is arguably the key section of "Statement of need" and in turn the entire paper. Currently it puts forth 5 reasons:
I would propose to trim the list, and to straighten the argument:
A. Seamless nesting of independent modular units (with emphasis on "seamless", which is what DataLad adds to Git's submodules)
B. Reproducible execution (or capture of actionable provenance)
C. Interoperability adapters and interfaces (more of a collection of the former, rather than a definition of the latter)
I think 1-5 are outcomes that can be achieved with A-C, rather than the technological contribution.
The current text seems to be easily sortable under A, B, and C to illustrate more or less intuitive use cases, why one would want such features.
The description of B could be extended to reach beyond provenance capture and hint at a wider metadata support.
In order to submit to JOSS:
Your paper (paper.md and BibTeX files, plus any figures) must be hosted in a Git-based repository together with your software (although they may be in a short-lived branch which is never merged with the default).
Dear Contributors to DataLad:
We have tried to email but failed for one reason or another.
Please see, follow the following instructions we emailed previously. Currently we aim for next Tue (Apr 20th submission):
Thank you for your previous contribution to DataLad (https://github.com/datalad/datalad), by code, issues, or feedback.
We are working on a manuscript to be submitted to the Journal of Open Source Software (https://joss.theoj.org) to describe DataLad, and would like to acknowledge your contribution(s). We are inviting you to co-author the paper, or, alternatively, give us permission to thank you in the Acknowledgements section of the paper.
If you would like to co-author the paper, please review the authorship criteria of JOSS at https://joss.readthedocs.io/en/latest/submitting.html#authorship and pay particular attention to potential implications of the "co-authors agree to be accountable for all aspects of the work" rule. If you personally consider a co-authorship appropriate under these conditions, please
If you would like to just be acknowledged, please either reply to this email stating that, or submit a PR with your name appropriately listed in the Acknowledgements section of https://github.com/datalad/datalad-paper-joss/blob/master/paper.md and remove the pre-created record with your name from the header.
If you would like to neither be listed among co-authors, nor acknowledged, we would appreciate if you reply and let us know about that.
We are planing to submit the manuscript next week (on/after April 12), and will appreciate if you act on this invitation by the end of this week.
Co-author records which would remain commented out will be removed before submission.
Thank you again for your contribution to DataLad!
Sincerely,
DataLad Team
2011 cannot be.
Will make an attempt
Dear Contributors to DataLad:
We have tried to email but failed for one reason or another.
Please see, follow the following instructions we have just emailed:
Thank you for your previous contribution to DataLad (https://github.com/datalad/datalad), by code, issues, or feedback.
We are working on a manuscript to be submitted to the Journal of Open Source Software (https://joss.theoj.org) to describe DataLad, and would like to acknowledge your contribution(s). We are inviting you to co-author the paper, or, alternatively, give us permission to thank you in the Acknowledgements section of the paper.
If you would like to co-author the paper, please review the authorship criteria of JOSS at https://joss.readthedocs.io/en/latest/submitting.html#authorship and pay particular attention to potential implications of the "co-authors agree to be accountable for all aspects of the work" rule. If you personally consider a co-authorship appropriate under these conditions, please
If you would like to just be acknowledged, please either reply to this email stating that, or submit a PR with your name appropriately listed in the Acknowledgements section of https://github.com/datalad/datalad-paper-joss/blob/master/paper.md and remove the pre-created record with your name from the header.
If you would like to neither be listed among co-authors, nor acknowledged, we would appreciate if you reply and let us know about that.
We are planing to submit the manuscript next week (on/after April 12), and will appreciate if you act on this invitation by the end of this week.
Co-author records which would remain commented out will be removed before submission.
Thank you again for your contribution to DataLad!
Sincerely,
DataLad Team
Google survey done on 20210319, stopped on page 6.
The rules are (from https://joss.readthedocs.io/en/latest/submitting.html#authorship):
Purely financial (such as being named on an award) and organizational (such as general supervision of a research group) contributions are not considered sufficient for co-authorship of JOSS submissions, but active project direction and other forms of non-code contributions are. The authors themselves assume responsibility for deciding who should be credited with co-authorship, and co-authors must always agree to be listed. In addition, co-authors agree to be accountable for all aspects of the work, and to notify JOSS if any retraction or correction of mistakes are needed after publication.
If we agree on the scope of the paper being datalad-core #1 this makes 34 contributors to the code on its github repo obvious co-author candidates. I can only think of one person with "directional" influence that is not on this list. My proposal would be to approach them, asking whether they would want to participate in the drafting of the manuscript, and thereby become co-authors under the terms quoted above.
I particularly do not mind a long author list. And I do not see the need for a "minimum code contribution" or anything like that. All these people either are or were active contributors or early adopters that registered that fact with a contribution of some kind. I also do not mind extending that list -- I just thought it would be a good starting point.
edit by @yarikoptic : a list of contributors prepared / acted on in a separate repo (should have just kept everything here) -- https://github.com/datalad/datalad-git-bug-dumps (json files with emails are under annex and not shared ATM)
In technical talks I tend to include the following list of design principles for DataLad:
I believe in their simplicity the can be instrumental in communicating the underlying mindset. Some aspects are already included in the text, but it still makes sense to me to simply present them in this refined form -- possibly right at the start of Overview of DataLad
Before this issue can be closed, all eventual authors must have signed off here.
I would prefer to structure the acknowledgements with the following order:
I still think that we should bring 4th point which would touch on metadata support in DataLad:
WDYT?
According to https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements:
Your paper (paper.md and BibTeX files, plus any figures) must be hosted in a Git-based repository together with your software (although they may be in a short-lived branch which is never merged with the default).
that means we do not have to suffer from a complex, heavy image file in the main repo. Hence I propose to not go for a figure that is not minimized for size. Moreover, it should also not be imaging specific, but still sciency. I propose this one as a starting point:
The present abstract states:
Born from the idea to provide a unified data distribution for neuroscience
While I believe this is not meant to be a scope-limit, it nevertheless makes the impression to me.
Q: Do we agree that there should not be the notion of "datalad is RDM for neuroscience"?
A testament of this is datasets.datalad.org, created as the project’s initial goal to provide a data distribution with unified access to already available public data archives in neuroscience, such as crcns.org and openfmri.org. It is curated by the DataLad team, and provides, at the time of publication, streamlined access to over 250 TBs of data across a wide range of projects
and archives in a fully modularized way.
The paper has the above, which is critical and the key evidence that the beast works. However, rather than "250TB" (where we claim that git-annex handles any size already on its own), we should add the number of datasets, and number of dataset sources/portals as indicators of how much versatility is captured by DataLad (not just git-annex) in this single collection.
Intuitively, I'd say we limit the scope to datalad-core. Leaving space for other focused publications. The notion of extensions is already in the manuscript. Just want to make this explicit.
Dartmouth College
) without specific departments/institutes within? that would collapse many of McGill affiliations. (didn't check yet what JOSS requirement is)A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.