Giter Site home page Giter Site logo

Primary demo about datalad-paper-joss HOT 5 CLOSED

datalad avatar datalad commented on July 25, 2024 1
Primary demo

from datalad-paper-joss.

Comments (5)

adswa avatar adswa commented on July 25, 2024 1

I have voiced it already, but to just state it here in this dedicated issue, I am also not enthusiastic about framing datalad search (haxby) as a prototypical demo.

I think what bothers me (beyond the fact that I have never used this command in my arguably frequent use of datalad πŸ˜„) is the subjective assessment of "The simplest "prototypical" example is ... ". Granted, it is one possible prototypical example (I bet each of us has their own anyway), but I fear that putting a sole focus on datalad search like this misrepresents the flexibility/versatility of the tool, and where and how it is currently used as scientific software. I have skimmed through the list of papers in #5 and the majority of papers seem to cite it for data/code management and data retrieval:

To facilitate scaling, reproducibility,data management and sharing we integrated DataLad as a data managementinfrastructure

DataLad aims to provide all the tools necessary forcreating and publishing data distributions to data sharing and collaborative work

It is also notable that this graph includes deep Haskell and TeX dependency stacks - which are pulled in by DataLad (Halchenko et al. 2016) and PythonTeX respectively.

The template can be downloaded from TemplateFlow either with datalad $ datalad install -r ///templateflow $ cd templateflow/tpl-MNI152NLin2009cSym/ $ datalad get -r *

[...] and can be easily downloaded using DataLad49 from http://datasets.datalad.org/?dir=/labs/gobbini.

The full database was openlyavailable in Datalad repository (http://datalad.org).

A version control system for the data should be implemented to guarantee reproducibility of the experimental results. Both at the MIP-Local, MIP-Central and MIP-Federated levels. The DataLad12initiative can be inspiringand a good initial referencein this sense.This can have a relevant impact onthe anonymization module.

Specialized tools have been developed to facilitate working with many of these datasets (for example, DataLad, https://www.datalad.org/datasets.html and OpenNeuro21,22).

The DataLad (Halchenko et al., 2018) project has developed a crawler to index the data from various scientific data portals for a unified interface from which to download these datasets from the command line interface on their computers.

I have spent the morning thinking about which example I would regard as a better fit - In the end I came to the conclusion that no example at all may be best.
Irrespective of how we and others use the tool, describing any one command or workflow as a single example comes too short. Also, we can't completely anticipate what people may use it for in the future or how datalad core develops, and we can't update the paper like we can update our docs. According to the review criteria of JOSS, "example usage" is not a required section in the paper. Its only a required aspect/reviewer checklist point that the documentation has examples, and it has plenty of them. Instead of selecting an example, I think it would be more useful to remove the second paragraph from "Documentation", and use the saved space to make sure that the section on DataLad core brings the functionality of DataLad across well.

from datalad-paper-joss.

yarikoptic avatar yarikoptic commented on July 25, 2024

re datalad run - it is indeed the super-feature. But it is not "actionable" on its own -- you first need to install/clone/(create+save), to even approach it. That is why I liked search which pretty much demonstrates "distribution" + "modularity" + "domain specific" aspects with a single command and anyone who just installed datalad could run it.

re search argument... I am ok to change it. "movie fmri" sounds ok. But I wonder if we may be could "cater" it to bring in datalad run as a follow up command -- e.g. include into a distribution a dataset with a nice datalad run'ed history, so we could search and follow up with datalad rerun?
do we have somewhere an issue on datalad-run metadata extractor -- so it would be possible to find datasets with (some) execution provenance records?

from datalad-paper-joss.

mih avatar mih commented on July 25, 2024

Hmm, I am not really following your argument.

Length/verbosity was not a concern for other manuscript parts. Why does it need to be here?

datalad search is not a good demo in my book. It is a label for a black box. Based on this demo, it is unlikely that anyone understands what this really does, or why it makes sense. Intuitively I would think that it is overly complicated to install a Git repository in order to perform a DB query with a search term. One needs network access anyways, why not query some server directly?

I really think that showing create, save, run gives a much better understanding of how this machinery works, that the interface is different from git/git-annex, that it really is about files, and capturing information, and what the actual complexity of all that is. datalad search illustrates none of that IMHO.

from datalad-paper-joss.

yarikoptic avatar yarikoptic commented on July 25, 2024

Length/verbosity was not a concern for other manuscript parts. Why does it need to be here?

My opinion was that twice the recommended short length is ok. As such, length is a factor/concern in that it can't grow indefinitely.

With nods during our regular meeting, everyone agreed on my proposal to making accent (and thus consume good portion of the space) on "why not git and git-annex alone" aspect. That indeed imposes limits on what else we should expand on in greater detail, or we would need to change on what we should accent (could as well be done, nothing set in stone).

It is a label for a black box. Based on this demo, it is unlikely that anyone understands what this really does, or why it makes sense.

For me it is a label for "magic" (and if anything -- a "glass" not "black" box), and I agree with β€œAny sufficiently advanced technology is indistinguishable from magic”: only insignificant portion of users understand what many advanced commands, git apt, and docker included do. I do not think there is a need for this paper to dive into explaining operation of each command we mention (and exact behavior of some mentioned commands remain a mystery to users as well).

... One needs network access anyways, why not query some server directly?

which server can you query ATM to immediately be able to get the content for the interesting results from the query? Again, I do not think it is worthwhile to dive into defending and detailing each command we mention - we do reference documentation, handbook, etc.

I really think that showing create, save, run gives a much better understanding of how this machinery works. ... datalad search illustrates none of that IMHO.

We diverged on from what aspects (or "essence") to see/present DataLad here: a "consumer" vs "producer"; "distribution" vs "management system". And difficulty is that it could be any of those and current example presents a "consumer" use case, and "create/save/run" would present producer (but even without sharing). Ideally we should present both. Please suggest how or I will make an attempt later as well.

from datalad-paper-joss.

yarikoptic avatar yarikoptic commented on July 25, 2024

I have skimmed through the list of papers in #5 and the majority of papers seem to cite it for data/code management and data retrieval:

just a note: as I do not see people mentioning google search query for the paper they found and cite, I do not expect them to mention datalad search as how they found any particular dataset, unless they did some meta-study across various datasets. Unfortunately the later is still very unlikely given absent harmonization of data, although projects like http://openneu.ro/metasearch/ (present as http://datasets.datalad.org/?dir=/labs/openneurolab/metasearch, and the use-case which triggered development of the datalad addurls) which collated anatomicals from various studies do exist.

Anyways, let's proceed with the path of minimal resistance and without any example/demo - I merged #71, which IMHO closes this issue. Feel free to reopen if want to re-introduce an example.

from datalad-paper-joss.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.