Comments (24)
I thought i'd echo @dorianps opinion. I appreciate that history shows the power of distirbuted over centralized... noone wants to go back to SVN but that's not what excited me most about Datalad. With the next big thing being things like MLops and other such buzz words I wonder whether emphasizing Datalad's ability to become a core tool beyond research science would be valuable. Something like:
Datalad: A foundation for managing code, data, and environments
Assuming another choice is the last thing everyone needs I've voted on the suggestions though : )
from datalad-paper-joss.
You guys probably have thought long about the title, but in my view the options above stress too much the concept of "distributed" or "decentralized" management, while the main feature I think datalad provides for all users has to do with data "tracking" or "versioning". Something like "collaborative distributed tracking" might reflect my perception of Datalad. Whether data management is distributed or not depends on the use case, i.e., some users may appreciate the distributed nature of datasets, others may use a centralized repository or even not collaborate with anyone and still benefit from the core data versioning (and time travel flexibility) of Datalad. Just my honest, shameless thought :)
from datalad-paper-joss.
Following up on previous points about what got us excited about datalad: For me it was the provenance tracking. That features of datalad is, as far as I can tell, really unique - and missing from the title.
Specifically, what is missing is that we can track which code changed/created which data. Atm the "joint management" could be read as "managing in one place", but, in a sense, in parallel/independently. I think the provenance aspect could be emphasized a bit more explicitly.
Riffing on the last title, something like
DataLad: distributed system for joint management of code, data, and their relationship
(that sounds a bit clunky to my ears.. but just to give an example of the direction I mean).
from datalad-paper-joss.
Added a candidate we defended recently and icons for voting (could be multiple)
from datalad-paper-joss.
decentralized versus decentral versus distributed?
decentralized sounds off in my ears
from datalad-paper-joss.
decentralized versus decentral versus distributed?
decentralized sounds off in my ears
Good point. Git also used "distributed". From git(1):
Git is a fast, scalable, distributed revision control system
from datalad-paper-joss.
I was following https://www.degruyter.com/document/doi/10.1515/nf-2020-0037/html where no author seemed to raise a red flag in choosing "decentralized". @mih - what was your guide for choosing "decentralized" in favor of "distributed" there? Staying consistent with the title/dRDM in that paper would IMHO be a bonus, although if it was severity flawed, I am ok to "generalize" into "distributed":
Looking at https://medium.com/distributed-economy/what-is-the-difference-between-decentralized-and-distributed-systems-f4190a5c6462 I think "decentralized" fits somewhat better than distributed as to reflect the most common use cases, although "distributed" reflects the technology underneath -- that git/git-annex/datalad indeed allow for a more distributed mode of operation.
from datalad-paper-joss.
There was no particular rational or drive behind "decentralized". Given the labling used by Git, I would have preferred to have made a different choice. As usual, I am also no believer in sticking to mistakes of the past ;-)
Re the comparison of the terms in the linked article: I think "decentralized" better fits actual usage patterns, but distributed is more appropriate for describing the technological capabilities. I suspect that the decentralized usage is largely driven by a deeply embedded concept of mine vs theirs.... we shall overcome ;-)
from datalad-paper-joss.
As usual, I am also no believer in sticking to mistakes of the past ;-)
as with any kind of a "release" it might later become considered as buggy as the prior one ;) and
“You become responsible, forever, for what you have tamed.”
― Antoine de Saint-Exupéry, The Little Prince
Overall -- I am fine with either, although leaning to "decentralized" for consistency and better reflection of the typical usage patterns. I guess the vote(s) hopefully would help us make the decision.
from datalad-paper-joss.
Thank you @dorianps for the feedback. Indeed, I think we somewhat missed "versioning" aspect entirely, as if it was given. "tracking" is somewhat implied by "decentralized" or "distributed" but not obvious, but it isalso unclear on its own so not sure if appropriate for a title.
Indeed it is hard to embed all possible features/use-cases into a single title. Makes me appreciate the official (in manpage) description of git ("the stupid content tracker") once more.
from datalad-paper-joss.
Just throwing an idea (without contributing a single line on the code):
Datalad: collaborative data tracking, transferring, and management, across multiple platforms
Platforms
= non-specific catch all (linux, windows, git, uk biobank)
May still work if collaborative
is replaced with distributed
.
from datalad-paper-joss.
I think "decentralized" better fits actual usage patterns, but distributed is more appropriate for describing the technological capabilities.
This. Which is why in software journal my vote is on "distributed" as far as it refers to Datalad.
Whether data management is distributed or not depends on the use case, i.e., some users may appreciate the distributed nature of datasets, others may use a centralized repository or even not collaborate with anyone and still benefit from the core data versioning
True, too.
Datalad: collaborative data tracking, transferring, and management, across multiple platforms
May still work if collaborative is replaced with distributed.
Guess the cross-platform aspect can be left out of the title. If no platform is mentioned in it, we don't need to fight a possible impression that it's platform specific. Moreover hardly any VCS is.
So: Datalad: distributed versioning and management for research data
? May be even "large data" instead of "research data". It's agnostic after all and while we might want to draw particular attention from scientific community, JOSS may be more useful for us if we get developers (potential contributors) interested with completely different usecases.
from datalad-paper-joss.
@bpoldrack Your version looks good to me, too. Two thoughts:
Research data
sounds like a complicated tool for researchers only. I once read a post at git-annex with someone keeping inventory of DVDs using annex. Datalad can be used for any data, research or not.- I thought one of the greatest strengths of Datalad is seamless transfer between platforms, i.e., going from linux to windows, from hard drive to usb, from local to cloud, etc. Those multiple transfer options are what makes it a universal tool for collaborations, that's why I included in the title, but even without it,
management
can still cover that aspect in a less specific way. So your title is good.
from datalad-paper-joss.
re "large" - not necessarily, since could be used for management of "sensitive" (licenses, personal data, etc) data
re "management for research data" - it is captured better IMHO already by a Research Data Management (RDM)
which is a known concept. So the discussion seems to be just still circling back to which critical features to somehow include in the title to characterize such RDM better. But it seems the currently leading choice of the title even doesn't mention "research" aspect ;)
from datalad-paper-joss.
Datalad: A foundation for managing code, data, and environments
I really like that take.
from datalad-paper-joss.
Me too! Thx to @dorianps and @leej3 for your perspective. I think we should consider this aspect for title and manuscript focus.
from datalad-paper-joss.
I think that the "foundation" aspect should indeed be verbalized in the paper. But
- "foundation" by itself is actually insufficient descriptor. A foundation establishes the grounds to further development (take a foundation of the house of NSF itself), but does not provide a full solution. And DataLad (core) is a complete (edit: i.e. already providing means for RDM) solution.
- I think "platform" might be a descriptor, which would also encompass the aspect of "foundation". So may be we could take the currently leading title and make it into "DataLad: distributed platform for joint management of code and data". WDYT?
from datalad-paper-joss.
I think that the "foundation" aspect should indeed be verbalized in the paper.
#34 is a possible "lean" injection of the foundation aspect. I guess there could be other places where it could be injected, but I do not think that the JOSS paper would be the best venue to center on "foundational aspect" of DataLad.
from datalad-paper-joss.
@yarikoptic “Foundation” felt more dramatic and inspiring but I agree it falls a little short in that it hints Datalad is not your all encompassing solution to handling these problems. I feel platform has been over-used because of the stuff in the cloud. I can’t think of a better choice though. It fits well. I like your alternative title.
Throwing out some other ideas in increasing order of absurdity in case one sticks or triggers an alternative in someone else’s head:
“Approach”, “system”,“Core-tool”, “comprehensive toolkit”,”ecosystem”,”vision”, “armamentarium”, “panacea”
from datalad-paper-joss.
What about "bedrock"?
I also like "infrastructure (tool)", but it shifts the focus away from the individual user.
A little esotheric: "digital companion for joint management of code and data"
from datalad-paper-joss.
A little esotheric: "digital companion for joint management of code and data"
;-) it would have been nice to finally bring DataLad from it soulless form to reflect on its name origin of a curious youth as an alternative to "a person who is deemed to be despicable or contemptible" .
from datalad-paper-joss.
Thank you @pvavra - (actionable for humans and computers) provenance of data transformations is indeed one of killer features. You suggestion sounds not too clunky and to the point as to me.
from datalad-paper-joss.
I have added the 👎 choice for the 🎉's refinement and added a clarification. Everyone who voted (especially for 🎉) please consider adjusting your vote or expressing explicit (comment) preference for 🎉 over 👎
from datalad-paper-joss.
the choice was made and it is
title: 'DataLad: distributed system for joint management of code, data, and their relationship'
in the paper
from datalad-paper-joss.
Related Issues (20)
- It is annoying to get the draft.pdf from github actions -- use magic!
- Length of manuscript HOT 2
- Neuroscience focus yes|no? HOT 6
- Invitations (via github) to co-author a DataLad paper for JOSS HOT 4
- Substantial scholarly effort: prior citations HOT 3
- complete and harmonize affiliations HOT 3
- NeuroHub or CONP funding? HOT 4
- Invitations (via github) to co-author a DataLad paper for JOSS (last call) HOT 2
- Figure selection HOT 8
- Revise DataLad additions over git/git-annex section HOT 6
- Add section/paragraph on design principles HOT 1
- Primary demo HOT 5
- Reconsider "contributions" section HOT 1
- Paper structure/content HOT 4
- Complete acknowledgements HOT 2
- Add or purge metadata? HOT 2
- Stats on datasets.d.o HOT 2
- Fix date in YODA reference HOT 3
- Migrate into the datalad repository once the paper is ready HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datalad-paper-joss.