Giter Site home page Giter Site logo

ncarrut / proteomics-metadata-standard Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bigbio/proteomics-sample-metadata

0.0 1.0 0.0 51.5 MB

The Proteomics Experimental Design file format: Standard for experimental design annotation

Python 100.00%

proteomics-metadata-standard's Introduction

Proteomics Experiment Design Project

1. Improving metadata annotation of Proteomics datasets

The Proteomics Experimental Design project aims to define a set of guidelines and file formats to support the annotation of the experimental design in Proteomics datasets in the public domain. The Proteomics Team advocates for open access and increasing the reuse of proteomics datasets and works towards providing concrete solutions to achieve this. Our goal with the Experimental Design format is to enssure maximum reusability of the deposited data. Our work aims to define the minimum information required to report the experimental design of proteomics experiments, enabling the use and reuse of the deposited data by the proteomics community.

The following Use Cases should be considered to design the Proteomics Experimental design data format:

  • The experiment design file format will complement the proteomeXchange.xml file format implemented by ProteomeXchange to capture the minimum metadata about a proteomics dataset. The ProteomeXchange submission XML file format is detailed here.

  • The experimental design format SHOULD enable data submitters and curators to annotate a proteomics dataset at different levels, including the sample metadata (e.g. organism and tissues), technical metadata (e.g. instrument model) and the experimental design.

  • The Experimental design format SHOULD facilitate the automatic reanalysis of public proteomics datasets, by providing a better representation of quantitative data in public repositories.

2. Notational conventions

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” are to be interpreted as described in RFC-2119 (Bradner 1997).

3. Ontologies

The Experimental Design format should be based on ontology or controlled vocabulary (CV) terms (e.g. UNIMOD-35). An ontology encompasses a representation, formal naming and definition of the categories, properties and relationships between the concepts, data and entities that substantiate one, many or all domains of discourse. All Ontologies used in the Proteomics Experimental Design format MUST be indexed in the Ontology Lockup Service. The current ontologies supported in the format are:

⚠️
If you you are contributing with the following guidelines and file format, and WOULD like to add another ontology; please modify the list with a Pull Request.

4. Sample experiment design structure (SDRF)

5. How to contribute

External contributors, researchers and the proteomics community are more than welcome to contribute to this project.

Contribute with the specification: you can contribute to the specification with ideas or refinements by adding an issue into the issue tracker or performing a PR.

5.1. Annotate Public dataset 🎆

In the annotated projects folder the user can see different public datasets that have been annotated so far by the contributors. If you would like to join these efforts, make a Fork of this repo and perform a pull request (PR) with your annotated project. If you don’t have a project in mind, you can take one project from the issues and perform the annotation.

Annotate a dataset in 5 steps:

  1. Read the SDRF specification

  2. Depending on the type of dataset, choose the appropriate sample template

  3. Annotate the the corresponding ProteomeXchange PXD dataset following the guidelines

  4. Validate your SDRF:

    In order to validate your SDRF, you can install the sdrf-pipelines tool in Python

    pip install sdrf-pipelines

    validate the SDRF

    parse_sdrf validate-sdrf --sdrf_file sdrf.tsv

    You can read more about the validator here.

  5. Fork the current repository, add a folder with the ProteomeXchange accession and the annotated sdrf.tsv

6. Core contributors and collaborators

The project is run by different groups:

  • Yasset Perez-Riverol (PRIDE Team, European Bioinformatics Institute - EMBL-EBI, U.K.)

  • Timo Sachsenberg (OpenMS Team, Tübingen University, Germany)

  • Anja Fullgrabe (Expression Atlas Team, European Bioinformatics Institute - EMBL-EBI, U.K.)

  • Nancy George (Expression Atlas Team, European Bioinformatics Institute - EMBL-EBI, U.K.)

  • Mathias Walzer (PRIDE Team, European Bioinformatics Institute - EMBL-EBI, U.K.)

  • Pablo Moreno (Expression Atlas Team, European Bioinformatics Institute - EMBL-EBI, U.K.)

  • Juan Antonio Vizcaíno (PRIDE Team, European Bioinformatics Institute - EMBL-EBI, U.K.)

  • Oliver Alka (OpenMS Team, Tübingen University, Germany)

  • Julianus Pfeuffer (OpenMS Team, Tübingen University, Germany)

  • Marc Vaudel (University of Bergen, Norway)

  • Harald Barsnes (University of Bergen, Norway)

  • Niels Hulstaert (Compomics, University of Gent, Belgium)

  • Lennart Martens (Compomics, University of Gent, Belgium)

  • Expression Atlas Team (European Bioinformatics Institute - EMBL-EBI, U.K.)

  • Lev Levitsky (INEP team, INEPCP RAS, Moscow, Russia)

  • Elizaveta Solovyeva (INEP team, INEPCP RAS, Moscow, Russia)

  • ProteomicsDB Team (Technical University of Munich, Germany)

  • Nicholas Carruthers (Wayne State University, USA)

If you contribute with the following specification, please make sure to add your name to the list of contributors.

7. Code of Conduct

As part of our efforts toward delivering open and inclusive science, we follow the Contributor Convenant Code of Conduct for Open Source Projects.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.