Giter Site home page Giter Site logo

course's Introduction

Build status Monthly link check Documentation Status DOI made-with-datalad

All Contributors

The DataLad handbook πŸ“™

This is a living resource on why and - more importantly - how to use DataLad. The rendered version is here: https://handbook.datalad.org, and is currently under initial development.

The handbook is a practical, hands-on crashcourse to learn and experience DataLad. You do not need to be a programmer, computer scientist, or Linux-crank. If you have never touched your computer's shell before, you will be fine. Regardless of your background and personal use cases for DataLad, the handbook will show you the principles of DataLad, and from chapter 1 onwards you will be using them.

Find more general information about the idea behind the handbook in the poster presented at the 2020 OHBM or dive straight into your DataLad adventure.

Contributing

Contributions in any form - pull requests, issues, content requests/ideas, ... are always welcome. If you are using the handbook and find that something does not work, please let us know. Likewise, if you are using DataLad for your individual project, consider contributing by telling us about your use-case. You can find out more on how to contribute here, and a list of all contributors so far below, in CONTRIBUTORS.md, and in .zenodo.json.

Notes for Instructors

The book is the basis for workshops and lectures on DataLad and data management. The handbook's course repository among other things contains live casts from the code examples in this book and slides. It is constantly growing, and everyone is free to use the material under the license terms below. Contributions and feedback are very welcome.

License

CC-BY-SA: You are free to

  • share - copy and redistribute the material in any medium or format
  • adapt - remix, transform, and build upon the material for any purpose, even commercially

under the following terms:

  1. Attribution β€” You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

  2. ShareAlike β€” If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Adina S. Wagner
Adina S. Wagner

πŸ’» πŸ–‹ πŸ“– 🎨 πŸ€” πŸš‡ 🚧 πŸ“† πŸ‘€ πŸ““ πŸ“’ ⚠️ πŸ› πŸ’‘ πŸ’¬ ️️️️♿️
Laura Waite
Laura Waite

πŸ€” 🚧 πŸ‘€ πŸ“’ πŸ’¬ πŸ–‹
Michael Hanke
Michael Hanke

πŸ’¬ πŸ› πŸ’» πŸ–‹ πŸ“– 🎨 πŸ’‘ πŸ€” πŸš‡ 🚧 πŸ”Œ πŸ“† πŸ‘€ πŸ”§ ⚠️ πŸ“’ πŸ““ ️️️️♿️
Kyle Meyer
Kyle Meyer

πŸ› πŸ‘€ πŸ’¬ πŸ–‹ πŸ€”
Marisa Heckner
Marisa Heckner

πŸ€” πŸ““ πŸ› πŸ–‹
Benjamin Poldrack
Benjamin Poldrack

πŸ’¬ πŸ€” πŸ’‘ βœ…
Yaroslav Halchenko
Yaroslav Halchenko

πŸ‘€ πŸ–‹ πŸ€” πŸ›
Chris Markiewicz
Chris Markiewicz

πŸ›
Pattarawat Chormai
Pattarawat Chormai

πŸ› πŸ’»
Lisa N. Mochalski
Lisa N. Mochalski

πŸ› πŸ–‹ πŸ’‘ πŸ€”
Lisa Wiersch
Lisa Wiersch

πŸ›
Jean-Baptiste Poline
Jean-Baptiste Poline

πŸ–‹
Nevena Kraljevic
Nevena Kraljevic

πŸ““
Alex Waite
Alex Waite

πŸ‘€ πŸ› πŸ€”
Lya K. Paas
Lya K. Paas

πŸ› πŸ’»
Niels Reuter
Niels Reuter

πŸ–‹
Peter Vavra
Peter Vavra

πŸ€” πŸ““
Tobias Kadelka
Tobias Kadelka

πŸ““
Peer Herholz
Peer Herholz

πŸ€”
Alexandre Hutton
Alexandre Hutton

πŸ–‹ πŸ›
Sarah Oliveira
Sarah Oliveira

πŸ‘€ πŸ€”
Dorian Pustina
Dorian Pustina

πŸ€”
Hamzah Hamid Baagil
Hamzah Hamid Baagil

πŸ““ πŸ›
Tristan Glatard
Tristan Glatard

πŸ› πŸ–‹
Giulia Ippoliti
Giulia Ippoliti

πŸ–‹ πŸ’‘
Christian MΓΆnch
Christian MΓΆnch

πŸ–‹
Togaru Surya Teja
Togaru Surya Teja

πŸ–‹
Dorien Huijser
Dorien Huijser

πŸ› πŸ““
Ariel Rokem
Ariel Rokem

πŸ›
Remi Gau
Remi Gau

πŸ› πŸ€” 🚧 πŸ‘€ πŸš‡ πŸ’» 🎨
Judith Bomba
Judith Bomba

πŸ›
Konrad Hinsen
Konrad Hinsen

πŸ›
Wu Jianxiao
Wu Jianxiao

πŸ›
MaΕ‚gorzata Wierzba
MaΕ‚gorzata Wierzba

πŸ““ πŸ‘€ βœ…
Stefan Appelhoff
Stefan Appelhoff

πŸš‡ πŸ”§ πŸ›
Michael Joseph
Michael Joseph

πŸ€” πŸ–‹ πŸ›
Tamara Cook
Tamara Cook

πŸ‘€ πŸš‡
Stephan Heunis
Stephan Heunis

πŸ› 🚧 πŸ–‹ πŸ’‘ πŸ‘€
Joerg Stadler
Joerg Stadler

πŸ›
Sin Kim
Sin Kim

πŸ› πŸ–‹ πŸ‘€
Oscar Esteban
Oscar Esteban

πŸ›
MichaΕ‚ Szczepanik
MichaΕ‚ Szczepanik

πŸ‘€ πŸ› πŸ–‹
eort
eort

πŸ›
Myrskyta
Myrskyta

πŸ›
Thomas Guiot
Thomas Guiot

πŸ›
jhpb7
jhpb7

πŸ›
Ikko Ashimine
Ikko Ashimine

πŸ›
Arshitha Basavaraj
Arshitha Basavaraj

πŸ–‹ πŸ› 🚧
Anthony J Veltri
Anthony J Veltri

πŸ““
Isil Bilgin
Isil Bilgin

πŸ› 🚧
Julian Kosciessa
Julian Kosciessa

πŸ–‹
Isaac To
Isaac To

🚧 πŸ–‹ πŸ›
Austin Macdonald
Austin Macdonald

πŸ›
Christopher S. Hall
Christopher S. Hall

πŸ›
jcf2
jcf2

πŸ›
Julien Colomb
Julien Colomb

πŸ–‹
Danny Garside
Danny Garside

πŸ› 🚧
Justus Kuhlmann
Justus Kuhlmann

πŸ–‹
melanieganz
melanieganz

πŸ›
Damien François
Damien François

πŸ› πŸ–‹
Tosca Heunis
Tosca Heunis

πŸ› πŸ““
Jeremy Magland
Jeremy Magland

πŸ›
Matthias Riße
Matthias Riße

πŸ›
David Nicholson
David Nicholson

πŸ› πŸ–‹

This project follows the all-contributors specification. Contributions of any kind welcome!

course's People

Contributors

adswa avatar bobknob23987 avatar lnnrtwttkhn avatar mih avatar yarikoptic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

course's Issues

public url for pics/slides?

got interested in sandwhich03.svg which comes from that submodule but that submodule has ssh url for it

(git)lena:~datalad/datalad-handbook/course[master]git
$> datalad -f json_pp subdatasets pics/slides
{
  "action": "subdataset",
  "gitmodule_name": "pics/slides",
  "gitmodule_url": "kumo.ovgu.de:/home/mih/public_html/datalad/slides",
  "gitshasum": "76882e01a9194444b507491889e7d9f6d6dcb6b2",
  "parentds": "/home/yoh/proj/datalad/datalad-handbook/course",
  "path": "/home/yoh/proj/datalad/datalad-handbook/course/pics/slides",
  "refds": "/home/yoh/proj/datalad/datalad-handbook/course",
  "state": "absent",
  "status": "ok",
  "type": "dataset"
}

1.5 day workshop in Lucca

@mih and I will be giving a workshop on DataLad in Lucca on March 23rd-24th. This issue lists the TODOs and acts as a progress tracker.
Please extend and edit as necessary. :)

Logistics

  • await Feedback from Lucca on dates
  • await Feedback from Lucca on GDrive account
  • figure out travel
    • @adswa (I will likely take a train. Depending on when we plan to arrive, there is a nice one overnight, arriving at 7 something in the morning) EDIT: both of us will go to Pisa from Montreal
    • @mih

Software

  • write a custom wrapper around a special remote for gdrive.

Teaching

A Basics layout has been proposed by @mih and awaits feedback from Lucca

  • Datalad concepts and principles

  • Basics of local data/code version control

    • Hands on: tasks to exercise basic building blocks
  • Modular data management for reproducible science

    • Hands on: implement sketch of a reproducible paper
  • Data management for collaborative science

    • Hands on: Using your infrastructure (Gdrive) to collaborate on a
      demo project
  • Data publication

    • Hands on: Publish data on "GitHub"
  • Outlook (what is else possible, resources, use cases)

  • Potential group work: Small sets of people are given problems to solve with DataLad and present

This is currently structured like this:
Monday 23 Morning session
1 Datalad concepts and principles
2 Basics of local data/code version control + Hands on: tasks to exercise basic building blocks

Monday 23 Afternoon session
1 Modular data management for reproducible science + Hands on: implement sketch of a reproducible paper
2 Data management for collaborative science + Hands on: Using your infrastructure (Gdrive) to collaborate on a demo project

Tuesday 24 Morning session
1 Data publication + Hands on: Publish data on "GitHub"
2 Outlook (what is else possible, resources, use cases)

Resources to create

  • rclone GDrive wrapper (started here datalad/datalad#4162)
  • slides
  • code lists
  • sketches of a LaTeX (?) skeleton for a reproducible paper. @adswa could potentially use resources she will help to improve at the Turing Way book dash.
  • Data to use for examples and to publish to Gdrive
  • Optional/Wishlist: Some sort of audience response system. EduVote (Browser-based, Google Forms, ...? E.g., in the form of: "How confident are you using --> rating scale"
  • Workshop feedback (potentially pre-post, to learn about attendees expectations before and after the course, knowledge gain. Also remember to collect Feedback on DataLad

Educating for a FAIR future talk at the NWG, due Feb 22nd

  • 10 minute video, prerecorded - young investigator presentation
  • live discussion virtually, March 28th, evening

abstract:
With a growing awareness of the role of sample size and replicable results (Button et al., 2013; Turner et al., 2018), a rise of platforms, tools, and standards that aim to facilitate data sharing and management (Wiener et al., 2016), unprecedented sample sizes (e.g., UKBiobank; Bzdok & Yeo, 2017), and increasingly complex data analyses (e.g, Glasser et al., 2013; Alfaro-Almagro et al., 2018), research data management (RDM) is essential to put open and FAIR neuroimaging research into effect. But just as FAIRness and RDM can not be an afterthought in any given scientific project, they also shouldn’t be an afterthought in the training and education of current and future generations of neuroscientists. This training has to fulfill the demands of different stakeholders in science: 1) Researchers, that apply RDM in their scientific projects, 2) PIs and similar personnel with management tasks, that need to set out and justify plans for the implementation of RDM and FAIR principles, and 3) trainers, such as librarians or data managers, that educate users on tools and practices for FAIR science (Fothergill et al., 2019, Grisham et al., 2016). Researchers of any career level and of any background need accessible tutorial-like educational content and documentation for relevant tools and concepts to apply FAIR RDM from the get go. Planners need high-level, non-technical information in order to make informed yet efficient decisions on whether a tool fulfils their needs. And trainers need reliable, open teaching material.
A user-driven alternative to scientific software documentation by software developers, β€œDocumentation Crowdsourcing”, has been successfully employed by the NumPy project (Oliphant, 2006; Pawlik et al., 2015). Extending this concept beyond documentation, we have created the DataLad handbook (handbook.datalad.org) as a free & open-source, user-driven and -focused educational instrument and resource for trainers, users, and planners for (research) data management, independent of their background and skill level (Wagner et al., 2020). Drawing from the experiences of creating more than 400 pages of educational material, with almost 40 independent contributors from around the world, and nearly 2 years of in-person and virtual teaching based on the handbook, I want to highlight the unique challenges of RDM training and as well as its opportunities for the field of neuroscience.

DebConf Talk on DataLad, due August 15th

The DebConf talk proposal was accepted.
Here is the abstract:

Title: DataLad - Decentralized Management of Digital Objects for Open Science

With a general awareness of a reproducibility crisis in many scientific areas and increasing importance of research data management in science and policy making, data-driven fields require convenient and scalable data management solutions. Standing on the shoulders of Git and git-annex (git-annex.branchable.com/, Joey Hess), DataLad provides a decentralized solution that enables the joint management of code, data, and complete containerized computational environments in a scalable and distributed fashion. With features such as unambiguous version control, a wide spectrum of data transport mechanisms, convenient provenance capture, and re-execution for verification or as an alternative to storage and transport, it enables and facilitates many aspects of open and reproducible science: collaboration, sharing, analytical transparency, computational reproducibility of digital research objects, and disk-space aware storage and computing workflows on infrastructure that ranges from personal laptops up to supercomputers.

In this talk, we will introduce DataLad, present its main features which should be of interest to the audience regardless of their relation to any field of science, and share the process and status of its adoption in the neuroimaging community.

Recording tips: https://debconf-video-team.pages.debian.net/docs/advice_for_recording.html

Useful free tool for simple audience polling: https://www.directpoll.com

This tool is very useful:

  • create questions in advance (expires after 30 days unless you "save" it again)
  • embed the live results into the presentation (using an <iframe></iframe> tag):
    <iframe src="https://directpoll.com/r?XDbzPBdJ2bAX0ZEC2YlWLumm6WtYBkChGSFh5Vwe4W"
    title="This is my poll", width="900", height="900"></iframe>

Book vs course

The goal is to develop a course, based on the book while minimizing the amount of disconnected material, and therefore making it easier to evolve book and course together with the evolution of datalad

  • the course and the book share the exact same content, but the former is performed, while the latter serves as the syllabus

  • code examples in the book are actually executable. we use this feature to turn them into "cast" scripts. once in that form, we can use the cast_live tools from DataLad to demo them in a course installment

  • each code example in the book needs to be equipped with a "caption" that can then serve as a narrative cue in the cast script. The caption could then also be displayed in the book itself.

  • each code example in the book needs to get a tag or label that can be used to subselect examples that make up a shorter, but still internally consistent narrative -- this aids the generation of shorter course installments

  • initially the slides of the course material are based on the "summary" components of each chapter, plus relevant key figures. once tailored to and validated by the teaching the course, their content is fed back into the book (possibly using a new dedicated markup). Each slide contains a link to the respective part of the book, where more details are available. The link is possibly implemented as a QR code.

  • the order of topics in the course matches the order in the book. if it turns out that this order is suboptimal it needs to be adjusted in both book and course. consequently, the course starts with basics and a uniform narrative, and ends with more standalone scenario descriptions.

  • the course starts with, or is following a "pitch" that outlines an attractive take-away for a respective target audience. Candidate pitches are any "use case" chapter.

  • slide decks for course installments are based on reveal.js, and are more or less fully generated using the book sources are a (set of) templates. Each chapter has its own slide deck.

  • analog to the book, each session/chapter (and in particular the early ones) must communicated in a self-evident fashion, why their content/objective is important, and applicable to practical problems a target audience can relate to.

Content (based on current book)

  1. Setup: Git ID, installation, what is a terminal
  2. Datasets (create, save, install, nesting): basic local version control, manual log keeping
  3. Run: basic provenance tracking , automatic log keeping
  4. Git-annex basics: disaster recovery (needs merge of currently disjoined chapters git-annex and help yourself
  5. Collaboration: yes!
  6. YODA: using the conceptual pieces optimally for maximum practical benefits -- this will be and is a mostly conceptual part

Each of these "basics" chapters is handled in a 90min installment.

After the initial sessions on "basics" and number of use case descriptions can follow.

For the initial run at INM7, we will have a dedicated "How to work with the local infrastructure" session that could take place any time after (3). This will the also turn into a use case chapter in the book.

Instead of a weekly or biweekly frequency, this course can also be tought as a 2-day block event, with the basics on day 1, and a re-cap + use cases on a (shorter) day 2.

ABCD-ReproNim Course

Date: Jan 22nd 2020
Tentative schedule:

ReproNim: Data Versioning and Transformation with DataLad
Instructor: Adina Wagner*, Institute of Neuroscience and Medicine (INM-7)
Why Should Data Be Versioned?
Simple DataLad Transform: Retrieve, Compute, Store Results
Create a Dataset
Using DataLad with Containers on the Dataset
Rerunning and Checking Analysis Differences

Submission due: Dec. 15th

Todos:

  • Pre-record your lecture (details to be provided separately) by September 15th/December 15th (depending on your 'session’; see syllabus);
  • Be available for your 1-hour question and answer period with the students on the Friday at 1pm EST/10am PST as indicated in the syllabus;
  • Provide 1-2 readings/watchings (~30 minutes) you would like to assign prior to your lecture;
  • Review the homework assignment generated by the TA team before distribution to the students;
  • (Optional) Attend the "virtual" workshop March 8-12, 2021.

Handbook2livecasts: Todos for cast_live and automatically creates casts

This is to document how to turn the handbook into cast_live scripts.

  1. Create a cast with annotated code snippets in the handbook (see datalad-handbook/book#217 for insights on how to do this)
  2. Use a custom version of DataLads cast_live to to "play" it

TODO:

  • update the cast_live tools to run without obscure failure (XGetWindowProperty[_NET_WM_DESKTOP] failed (code=1))
    • the command that fails is xdotool windowactivate --sync $(xdotool getwindowfocus)
  • create a copy of appropriately customized cast live tools in this repo
  • add the casts (as soon as they are created)

IRTG Workshop Aachen

When: November 26th, 2019, 4pm
Where: Same library seminar room as before
Duration: 2 hours
Participants: 25 grad students, various backgrounds (neuroscience, psych, bio, physics, engineering, medicine), workshop will be made compulsory

Communicated expectations on content:

  • DataLad
  • BIDS

TODO

  • Dienstreiseantrag
  • Short description/overview to distribute in advance
  • Slides/casts
  • Code/materials for participants

Own thoughts

  • The time is extremely limited: The workshop needs to get them motivated to learn the tools (e.g., start with reproducible paper teaser, and for BIDS maybe show brainlife.io), give a brief introduction into the basics principles (prob. Dataset basics and as shortened Reproducible execution session), and above that contain pointers to everything that is relevant for subsequent self-study.
  • Based on the conversation with Julia and HanGue, students don't seem to know about version control/Git, BIDS or any standard structure. Teaching them the very basics alone will already make a large difference to their workflows.
  • possibly: collate a sheet with a collection of useful links.

15 Min talk in Oldenburg, Nov. 2nd and 3rd

For a symposium "Open and Reproducible Neuroimaging: Integration of community developed tools from data acquisition to publication". Michael and I will both have a 15 min slot to talk about data storage and retrieval.

Lessons from datamanagement support sessions based on the book

  • It would be useful to have an interactive run session (e.g., datalad run nano).
  • Building up the command by try-and-error as in the book doesn't work as good in a workshop session - It is hard to motivate why we run into all of these errors, and easy to lose track of what it is we're trying to achieve

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.