Giter Site home page Giter Site logo

biohackathon2022's Introduction

Project 25: Scientific and technical enhancement of bioinformatics software metadata using the Tools Ecosystem open infrastructure

Contribution

If you want to contribute to this project, see the contribution inscructions here.

Abstract

The Tools Ecosystem is a centralized repository for the open and transparent exchange of metadata about software tools and services in Bioinformatics and Life Sciences. It serves as the foundation for the sustainability of the diverse Tools Platform services, and for the interoperability between all these essential services (bio.tools, BioContainers, OpenEBench, Bioconda, WorkflowHub, usegalaxy.eu) and related resources outside of the ELIXIR Tools Platform (e.g. Bioschemas).

The goal of this project will be to cross-compare and analyze the metadata centralized in the Tools Ecosystem to maintain high quality descriptions. In order to achieve these goals we need to design tools and processes that detect curation bottlenecks, perform rigorous data cross-validation and generate detailed reporting about potential issues and actionable items.

Multiple strategies will be explored:

  • Comparison of the functional profiles of bio.tools entries with the corresponding semantic constraints defined in EDAM. Develop software to identify and report on inconsistencies between resources.
  • Comparison of the metadata defining a software tool with the knowledge extracted from publications that cite it, as well as the workflows that use it.

Beyond the immediate improvement of the metadata, we plan to use the results of these analyses in order to:

  • Automate relevant analyses using continuous integration mechanisms (extending previous and current work in EDAM and the Tools Ecosystem)
  • Improve curation user interfaces to reduce the risk of annotation errors.
  • Provide high quality functional tool profiles to be used in the context of workflow annotation

Another important goal is to provide onboarding of and support for scientific communities joining the Biohackathon.

Given the nature of the data we use in this project, we will be working in close collaboration with the project "Enhance RDM in Galaxy by utilizing RO-Crates", who will also be leveraging workflow and software metadata from the same resources.

Topics

Bioschemas Federated Human Data Galaxy Interoperability Platform Tools Platform

Project Number: 25

Lead(s)

Lucie Lamothe ([email protected]) Hervé Ménager ([email protected])

(Hans Ienasescu ([email protected]) )

Expected outcomes

By the end of the BioHackathon week:

  • Results of the cross-analysis of bioinformatics tools, highlighting potential inconsistencies or annotation gaps between the different resources, and suggesting annotation improvements (missing or more specific terms) for registry curators.
  • Software code to run the analyses mentioned
  • Prototypes for CI tasks that automate the analyses
  • Initiate contact with scientific communities and perform actions to ensure future onboarding and support (e.g. identify gaps and EDAM, bio.tools, WorkflowHub)

Within 3 months of the end of the Biohackathon:

  • Production-ready code and CI tasks automating the analyses to improve the monitoring of the Tools Ecosystem
  • Improvements to the bio.tools curation UI, if analysis results reveal that such modifications might help or improve the annotation quality.
  • New concepts in EDAM, tools in bio.tools , workflows in WorkflowHub created by the scientific communities

Expected audience

  • Ontology specialists
  • Workflow specialists
  • Python programmers
  • Data analysts
  • Bioinformatics Software providers/packagers
  • Scientific community domain experts

Number of expected hacking days: 4

Meetings infos

Meeting 06/10/2022: link to minutes

biohackathon2022's People

Contributors

lucielamothe avatar hmenager avatar supernord avatar matuskalas avatar jrbjensen avatar krab1k avatar rosinec avatar matejantol avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.