Giter Site home page Giter Site logo

Select title about datalad-paper-joss HOT 24 CLOSED

datalad avatar datalad commented on June 26, 2024 20
Select title

from datalad-paper-joss.

Comments (24)

leej3 avatar leej3 commented on June 26, 2024 3

I thought i'd echo @dorianps opinion. I appreciate that history shows the power of distirbuted over centralized... noone wants to go back to SVN but that's not what excited me most about Datalad. With the next big thing being things like MLops and other such buzz words I wonder whether emphasizing Datalad's ability to become a core tool beyond research science would be valuable. Something like:

Datalad: A foundation for managing code, data, and environments

Assuming another choice is the last thing everyone needs I've voted on the suggestions though : )

from datalad-paper-joss.

dorianps avatar dorianps commented on June 26, 2024 2

You guys probably have thought long about the title, but in my view the options above stress too much the concept of "distributed" or "decentralized" management, while the main feature I think datalad provides for all users has to do with data "tracking" or "versioning". Something like "collaborative distributed tracking" might reflect my perception of Datalad. Whether data management is distributed or not depends on the use case, i.e., some users may appreciate the distributed nature of datasets, others may use a centralized repository or even not collaborate with anyone and still benefit from the core data versioning (and time travel flexibility) of Datalad. Just my honest, shameless thought :)

from datalad-paper-joss.

pvavra avatar pvavra commented on June 26, 2024 1

Following up on previous points about what got us excited about datalad: For me it was the provenance tracking. That features of datalad is, as far as I can tell, really unique - and missing from the title.

Specifically, what is missing is that we can track which code changed/created which data. Atm the "joint management" could be read as "managing in one place", but, in a sense, in parallel/independently. I think the provenance aspect could be emphasized a bit more explicitly.

Riffing on the last title, something like

DataLad: distributed system for joint management of code, data, and their relationship

(that sounds a bit clunky to my ears.. but just to give an example of the direction I mean).

from datalad-paper-joss.

yarikoptic avatar yarikoptic commented on June 26, 2024

Added a candidate we defended recently and icons for voting (could be multiple)

from datalad-paper-joss.

adswa avatar adswa commented on June 26, 2024

decentralized versus decentral versus distributed?

decentralized sounds off in my ears

from datalad-paper-joss.

mih avatar mih commented on June 26, 2024

decentralized versus decentral versus distributed?

decentralized sounds off in my ears

Good point. Git also used "distributed". From git(1):

Git is a fast, scalable, distributed revision control system

from datalad-paper-joss.

yarikoptic avatar yarikoptic commented on June 26, 2024

I was following https://www.degruyter.com/document/doi/10.1515/nf-2020-0037/html where no author seemed to raise a red flag in choosing "decentralized". @mih - what was your guide for choosing "decentralized" in favor of "distributed" there? Staying consistent with the title/dRDM in that paper would IMHO be a bonus, although if it was severity flawed, I am ok to "generalize" into "distributed":

Looking at https://medium.com/distributed-economy/what-is-the-difference-between-decentralized-and-distributed-systems-f4190a5c6462 I think "decentralized" fits somewhat better than distributed as to reflect the most common use cases, although "distributed" reflects the technology underneath -- that git/git-annex/datalad indeed allow for a more distributed mode of operation.

from datalad-paper-joss.

mih avatar mih commented on June 26, 2024

There was no particular rational or drive behind "decentralized". Given the labling used by Git, I would have preferred to have made a different choice. As usual, I am also no believer in sticking to mistakes of the past ;-)

Re the comparison of the terms in the linked article: I think "decentralized" better fits actual usage patterns, but distributed is more appropriate for describing the technological capabilities. I suspect that the decentralized usage is largely driven by a deeply embedded concept of mine vs theirs.... we shall overcome ;-)

from datalad-paper-joss.

yarikoptic avatar yarikoptic commented on June 26, 2024

As usual, I am also no believer in sticking to mistakes of the past ;-)

as with any kind of a "release" it might later become considered as buggy as the prior one ;) and

“You become responsible, forever, for what you have tamed.”

― Antoine de Saint-Exupéry, The Little Prince

Overall -- I am fine with either, although leaning to "decentralized" for consistency and better reflection of the typical usage patterns. I guess the vote(s) hopefully would help us make the decision.

from datalad-paper-joss.

yarikoptic avatar yarikoptic commented on June 26, 2024

Thank you @dorianps for the feedback. Indeed, I think we somewhat missed "versioning" aspect entirely, as if it was given. "tracking" is somewhat implied by "decentralized" or "distributed" but not obvious, but it isalso unclear on its own so not sure if appropriate for a title.
Indeed it is hard to embed all possible features/use-cases into a single title. Makes me appreciate the official (in manpage) description of git ("the stupid content tracker") once more.

from datalad-paper-joss.

dorianps avatar dorianps commented on June 26, 2024

Just throwing an idea (without contributing a single line on the code):

Datalad: collaborative data tracking, transferring, and management, across multiple platforms

Platforms = non-specific catch all (linux, windows, git, uk biobank)

May still work if collaborative is replaced with distributed.

from datalad-paper-joss.

bpoldrack avatar bpoldrack commented on June 26, 2024

I think "decentralized" better fits actual usage patterns, but distributed is more appropriate for describing the technological capabilities.

This. Which is why in software journal my vote is on "distributed" as far as it refers to Datalad.

Whether data management is distributed or not depends on the use case, i.e., some users may appreciate the distributed nature of datasets, others may use a centralized repository or even not collaborate with anyone and still benefit from the core data versioning

True, too.

Datalad: collaborative data tracking, transferring, and management, across multiple platforms
May still work if collaborative is replaced with distributed.

Guess the cross-platform aspect can be left out of the title. If no platform is mentioned in it, we don't need to fight a possible impression that it's platform specific. Moreover hardly any VCS is.

So: Datalad: distributed versioning and management for research data ? May be even "large data" instead of "research data". It's agnostic after all and while we might want to draw particular attention from scientific community, JOSS may be more useful for us if we get developers (potential contributors) interested with completely different usecases.

from datalad-paper-joss.

dorianps avatar dorianps commented on June 26, 2024

@bpoldrack Your version looks good to me, too. Two thoughts:

  1. Research data sounds like a complicated tool for researchers only. I once read a post at git-annex with someone keeping inventory of DVDs using annex. Datalad can be used for any data, research or not.
  2. I thought one of the greatest strengths of Datalad is seamless transfer between platforms, i.e., going from linux to windows, from hard drive to usb, from local to cloud, etc. Those multiple transfer options are what makes it a universal tool for collaborations, that's why I included in the title, but even without it, management can still cover that aspect in a less specific way. So your title is good.

from datalad-paper-joss.

yarikoptic avatar yarikoptic commented on June 26, 2024

re "large" - not necessarily, since could be used for management of "sensitive" (licenses, personal data, etc) data

re "management for research data" - it is captured better IMHO already by a Research Data Management (RDM) which is a known concept. So the discussion seems to be just still circling back to which critical features to somehow include in the title to characterize such RDM better. But it seems the currently leading choice of the title even doesn't mention "research" aspect ;)

from datalad-paper-joss.

bpoldrack avatar bpoldrack commented on June 26, 2024

Datalad: A foundation for managing code, data, and environments

I really like that take.

from datalad-paper-joss.

mih avatar mih commented on June 26, 2024

Me too! Thx to @dorianps and @leej3 for your perspective. I think we should consider this aspect for title and manuscript focus.

from datalad-paper-joss.

yarikoptic avatar yarikoptic commented on June 26, 2024

I think that the "foundation" aspect should indeed be verbalized in the paper. But

  • "foundation" by itself is actually insufficient descriptor. A foundation establishes the grounds to further development (take a foundation of the house of NSF itself), but does not provide a full solution. And DataLad (core) is a complete (edit: i.e. already providing means for RDM) solution.
  • I think "platform" might be a descriptor, which would also encompass the aspect of "foundation". So may be we could take the currently leading title and make it into "DataLad: distributed platform for joint management of code and data". WDYT?

from datalad-paper-joss.

yarikoptic avatar yarikoptic commented on June 26, 2024

I think that the "foundation" aspect should indeed be verbalized in the paper.

#34 is a possible "lean" injection of the foundation aspect. I guess there could be other places where it could be injected, but I do not think that the JOSS paper would be the best venue to center on "foundational aspect" of DataLad.

from datalad-paper-joss.

leej3 avatar leej3 commented on June 26, 2024

@yarikoptic “Foundation” felt more dramatic and inspiring but I agree it falls a little short in that it hints Datalad is not your all encompassing solution to handling these problems. I feel platform has been over-used because of the stuff in the cloud. I can’t think of a better choice though. It fits well. I like your alternative title.

Throwing out some other ideas in increasing order of absurdity in case one sticks or triggers an alternative in someone else’s head:
“Approach”, “system”,“Core-tool”, “comprehensive toolkit”,”ecosystem”,”vision”, “armamentarium”, “panacea”

from datalad-paper-joss.

mih avatar mih commented on June 26, 2024

What about "bedrock"?

I also like "infrastructure (tool)", but it shifts the focus away from the individual user.

A little esotheric: "digital companion for joint management of code and data"

from datalad-paper-joss.

yarikoptic avatar yarikoptic commented on June 26, 2024

A little esotheric: "digital companion for joint management of code and data"

;-) it would have been nice to finally bring DataLad from it soulless form to reflect on its name origin of a curious youth as an alternative to "a person who is deemed to be despicable or contemptible" .

from datalad-paper-joss.

yarikoptic avatar yarikoptic commented on June 26, 2024

Thank you @pvavra - (actionable for humans and computers) provenance of data transformations is indeed one of killer features. You suggestion sounds not too clunky and to the point as to me.

from datalad-paper-joss.

yarikoptic avatar yarikoptic commented on June 26, 2024

I have added the 👎 choice for the 🎉's refinement and added a clarification. Everyone who voted (especially for 🎉) please consider adjusting your vote or expressing explicit (comment) preference for 🎉 over 👎

from datalad-paper-joss.

yarikoptic avatar yarikoptic commented on June 26, 2024

the choice was made and it is

title: 'DataLad: distributed system for joint management of code, data, and their relationship'

in the paper

from datalad-paper-joss.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.