Giter Site home page Giter Site logo

cmungall / biocaddie-gym Goto Github PK

View Code? Open in Web Editor NEW
0.0 4.0 0.0 6.47 MB

BioCADDIE Harvester project using GitHub YAML and Markdown

Python 66.69% Shell 0.95% Makefile 10.73% GCC Machine Description 21.64%
metadata yaml github markdown biocaddie biocaddie-harvester travis dataset-description

biocaddie-gym's Introduction

This is the main repository for the BioCADDIE GYM Harvester project

This project will explore the use of social coding sites such as GitHub for publishing and sharing descriptions of datasets. We will develop create a format aligned with the Health Care and Life Sciences (HCLS) Dataset Description profile that can be easily embedded in a project repository, and foster a lightweight tool ecosystem around this. This includes dynamic publishing of dataset descriptions, and automatic validation through the Travis continuous integration system. We will pilot this project by taking existing datasets and describing them retrospectively, and by working with an existing project and describing this prospectively. The goal will to ultimately have this indexed within the biocaddie system.

Getting Started

Go to the fake-data demo/template site:

http://biodatasets.github.io/mybiocaddie/

And click "Fork Me", follow the instruction there.

About this repository

For sharing our data and metadata, follow the instructions above. This repo houses many of the scripts and tools that take of things behind the scenes.

Validation and Processing Scripts

See the bin directory for useful scripts for parsing and aggregating md files.

The mybiocaddie template includes a file .travis.yml

Currently this downloads a python script from this repo, and this is executed to test the contents of the repo. Note: people forking the mybiocaddie template will need to enable travis to perform checks.

See also issue 9

Harvesting

See: https://github.com/cmungall/biocaddie-gym/milestone/4

One of the aims of the project is to demonstrate the feasibility of harvesting user-supplied metadata in github repositories.

The demonstrator function here is not intended to supplant actual BioCaddie harvesting technologies, but to show how they may be augmented.

See the Makefile for how to run this harvesting step.

TODO: issue 11 make a CI job

Dataset Providers: we want you!

Do you provide or release data using a distributed VCS site like github? Are you interested in doing this? Alternatively, are you interested in using github to provide metadata for data released elsewhere (e.g. Dryad, FigShare, ...)?

If so, we want to hear from you!

Twitter: @biocaddie or @chrismungall GitHub: @cmungall Email: [email protected]

More details coming soon...

biocaddie-gym's People

Contributors

cmungall avatar kshefchek avatar

Watchers

 avatar  avatar  avatar  avatar

biocaddie-gym's Issues

Create harvest report

Take the harvested-repos.json (can use on in sample for now), and generate a markdown page.

The md page can have one entry for every repo, showing a travis badge. For every repo, make a line for every dataset

Mapping to other metadata formats

Related to #1 This is something in the domain of https://github.com/biocaddie/WG3-MetadataSpecifications but need to address it in the context of this project.

There are a variety of proposals and specifications for metadata description etc

  1. http://www.openarchives.org/ore/0.9/jsonld
  2. http://www.ddialliance.org/explore-documentation
  3. http://www.w3.org/TR/hcls-dataset/
  4. https://github.com/biocaddie/WG3-MetadataSpecifications

One of the goals of this project is to make metadata authoring easy for reasonably tech-savvy users, by using a simple yaml format.

Any JSON document has a structurally equivalent YAML document (YAML is more expressive, so the converse is not true). This means we can take any JSON format and allow authoring it in YAML.

For JSON, we have the option of adding a @context and making the JSON-LD and this semantically equivalent to a corresponding RDF document as defined by the mappings in the context. One of the original goals of this project was to take a mature RDF dataset description profile and define a JSON-LD context that would allow easy authoring of compliant RDF in YAML. This goal may evolve as we progress with #1

In the interim do we have recommendations for dataset providers who have a strong preference? It seems we should be liberal and what we accept, and provide converters, although there are certain advantages to encouraging a single format.

Improve validation on user's markdown/yaml files in forked repos

When a user forks the mybiocaddie repo, they also get as:

  • .travis.yml - this downloads code from this repo
  • Makefile

Currently the makefile has a null make test target.

This could be improved by using the code in the bin of this repo to extract out the yaml from the md and make sure it is syntactically valid.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.