eubic / eubic18 Goto Github PK

1.0 11.0 0.0 29 KB

EuBIC 2018 developer's meeting

Home Page: http://uahost.uantwerpen.be/eubic18/

eubic18's Introduction

EuBIC 2018 developer's meeting

9–12 January 2018, Ghent, Belgium

The EuBIC 2018 developer's meeting will bring together scientists active in the field of bioinformatics and computational proteomics.

This repository will contain some information related to the EuBIC 2018 developer's meeting.

For more information, please check the official website.

Project proposal submission

Topics for the hackathon sessions during the EuBIC 2018 developer's meeting can now be proposed!

Please carefully read the full guidelines before submitting a project proposal and make sure to add all relevant information to your proposal.

How to submit a project proposal?

Create an issue in this repository describing your project. When creating an issue a template is provided listing some of the relevant information that should be included:

Project description:

A general abstract of up to 200 words describing the goal of the project and why it is well suited as a community project.
A (high-level) project plan detailing the work to be conducted. This primarily includes tasks that will be tackled during the developer's meeting, but we encourage you to also think about a follow-up strategy.

Technical details:

The programming language(s) that will be used.
(If applicable) any existing software that will be featured.
(If applicable) any datasets that will be used and their availability.

Contact information:

Your name, affiliation, and contact information.

How to contribute?

Vote for your favorite projects in the issue tracker to show your support.
Leave comments to the most interesting proposals. Engage in a discussion to finetune the project proposals!

eubic18's People

Contributors

Stargazers

Watchers

eubic18's Issues

QC dashboard

Abstract

A systematic approach to quality control (QC) is of crucial importance to endorse the results of a mass spectrometry experiment. Although several tools that compute QC information exist, making use of the QC metrics they generate to determine the quality of an experiment remains a hard task. The HUPO-PSI QC-dev working group is developing a QC file format to facilitate the collection and sharing of various computed quality metrics, from single run analytics to longitudinal control capture and complete experiment quality reports.
In this project, we will develop a tool for the visualization and analysis of different QC metrics. For example, a Hotelling T2 control chart can be used to highlight experiments for which a metric exhibits extreme values, while robust PCA or other outlier detection techniques can be used to identify low-performing experiments.
The aim of this project is to develop a dashboard to load QC data, analyze the QC data to detect experiments with a diminished performance, and visualize this information in a user-friendly and easy to access fashion.

Work plan

We will introduce qcML and explore platforms to create QC metrics for mass spectrometry. Following that, we will:

Collect data from target platforms (analysis software, qcML)
Develop visualisation for dashboard visuals to:
1. Hotelling T2 control
2. Outlier interpretation aides
3. Various metrics from participants choice/HUPO-PSI QC metric controlled vocabulary
Deposit calculated/adopted metrics in a qcML document(s)

Technical details

There is a plethora of ways, methods, and tools to achieve the described goals, including javascript / R / Python / C++

We will access (and write) data from qcML (p.r.n. mzID, mzML -which are HUPO-PSI XML formats- and use existing software that calculates QC metrics).

Data:

https://www.ebi.ac.uk/pride/archive/projects/PXD002901 (Timeseries)
https://www.ebi.ac.uk/pride/archive/projects/PXD002080 (CPTAC)
https://www.ebi.ac.uk/pride/archive/projects/PXD000322 (Quality signatures)

Follow up:

create a web service for retrieving the created visualisations and en-suite embedded use of our dashboard
adapt various open software reporting on quality metrics analysis platforms that report quality to write qcML and get added value from our QC dashboard visualisations

Links:
https://rstudio.github.io/shinydashboard/
https://github.com/HUPO-PSI/qcML-development

Contact information

Mathias Walzer, EMBL-EBI
[email protected]

Novel algorithms for DIA based label-free quantification

Abstract

As outlined by Sander Willems in issue 1, in ion mobility based data-independent acquisition technique HDMSE (also UDMSE), ions are separated in seconds by LC, in milliseconds by Ion Mobility Separation (IMS) and in nanoseconds by MS, and the acquired data has an additional dimensionality of drift-time. So far IMS-MSE data can only be processed by Progenesis QIP and PLGS (commercial tools, Waters). There are very few solutions for postprocessing this kind of data in label-free quantification experiments. The first one available was ISOQuant. ISOQuant is written in Java, and uses a MySQL database for storing and fast accessing data imported from PLGS projects. We previously developed and evaluated algorithms for retention time alignment, feature clustering & feature intensity normalization, which have been implemented in ISOQuant. However, we feel that the implemented algorithms could be improved, especially in context of ion-mobility data.

Work plan

Basics

Introduction into IMS-MS workflows for label-free quantification
Introduction into mixed proteome standards and LFQBench
Users will get acquainted with the modular structure of ISOQuant and the underlying MySQL database
Concepts for retention time alignment, clustering and normalization

Aims

Development and implementation of novel ISOQuant modules for

retention time alignment using MS, IMS and peptide identification information,
multidimensional feature clustering using m/z, RT and drift time information,
intensity normalization.

Implemented modules and their effects on LFQ precision and accuracy will be tested using LFQBench

Technical details

Programming language
- Java
- SQL
- R, Python or any scripting language
Useful familarity
- PLGS
- ISOQuant
- LFQbench & hybrid proteome concept
- Waters data structures
  - raw
  - mass spectrum xml
  - workflow results xml
Dataset
- PXD001240 for testing
- We will provide some smaller datasets for developmental purposes

Contact information

Jörg Kuharev
UNIVERSITÄTSMEDIZIN der Johannes Gutenberg-Universität Mainz
[email protected]

Stefan Tenzer
UNIVERSITÄTSMEDIZIN der Johannes Gutenberg-Universität Mainz
[email protected]

Data Management and Instrument Performance Monitoring for Proteomics Platforms

Abstract

Proteomics core facilities deal with simultaneous analysis of multiple datasets on a several instruments in parallel. Due to high downtime costs, it is desired to have the instruments working efficiently and constantly.
Typically a LC-MS run delivers more information, than is utilized for the particular project, thus, it might be necessary to reanalyze the existing data to test new scientific hypotheses or refine the data processing. Moreover it is legally required to archive the raw data in some clinical and/or forensic laboratories.
The unpredictable reliability of LC-MS acquisitions (due to LC interruptions or decrease of instrument sensitivity/accuracy) may lead to corrupted or low quality raw files, requiring to repeat the LC-MS run. It increases the cost of MS experiments by consuming human and instrument resources and sometimes leads to the loss of precious samples. Thus, the availability of real-time QC monitoring would be extremely useful to mitigate these problems.
Finally, it is necessary to provide the access to the data and the results, to internal and external collaborators.
Hence, two primary aims are becoming important: 1) efficient and robust data handling, allowing for data sustainability and traceability, and 2) continuous and reproducible performance monitoring.

Work plan

The solution will be built as an improvement to the existing software: Proline Server and MS-Angel http://proline.profiproteomics.fr/.
Improvements for MS-Angel:

Adding register raw file task, supporting both single file formats (such as Thermo RAW, Sciex WIFF) and single folder format (such as Bruker D).
Adding support for remote copy with authorization to dedicated file server using common protocols (FTP, Samba, SCP, etc), using defined set of rules.
Allowing for extracting of basic metadata from the raw file directly. First aim Thermo RAW file format support, the secondary aim is to support other formats (Bruker, Sciex).

Improvements for Proline Server:

Extending schema on Proline Server to include additional metadata for raw files.
Making complete metadata available for the search. Updating raw file selection module and batch metadata manipulation module to include more fields.
Developing file export API.
Performing automated analysis on the files with specific characteristics (QC runs). The QC includes database search, peak shape analysis, and retention time stability for multiple peptides.
Providing easy to use interface to monitor QC results over time and instrument. Using time-series in Proline Server. Include graphs, plots, and export data in common formats (to be analyzed by external tools).
Provide the allowed (indicating optimal performance) range for each QC metric and control if the performance is optimal for each QC run. The result should be easily interpretable by the user, for example in the form “M out of N tests are passed”. The optimal ranges can either defined by the operator or learned from the data.
Adding possibility to abort the run or take some other actions if the QC test fails. Implemented through Instrument API on the latest generation of Thermo instruments.

Technical details

Main programming language

Scala

Supporting languages

Python
Perl

Existing software:

Proline
MS-Angel

Contact information

Vladimir Gorshkov [email protected]
David Bouyssié [email protected]

Implementation of software protocols in computational proteomics

Abstract

Proteomics is a broad research field embracing a wide range of alternative experimental and computational approaches. This led to the development of a zoo of single- and multi-purpose software tools where documentation of tool usage and applicable pipelines are either scattered over the respective web sites or not available.
This project aims at creating a set of protocols that describe how well defined problems can be solved using a given software. More specifically, we aim to provide user-friendly and immediately applicable protocols for the principal computational tasks in proteomics research. Additionally, we will work on creating software containers that allow performing the tasks in a controlled environment.
The resulting documented workflow containers will be made available in a central repository and published in a well-defined format. The format will establish the layout for future protocol submissions. Herein, we want to promote interactive solutions, interlinking with user support platforms and common training platforms such as proteomics-academy.org and ELIXIR TeSS.

Ultimately, we aim to collect material for publication in a dedicated issue, e.g. in Journal of Proteomics.

Work plan

Discussion about publication format and areas of interest
Outline of protocol layout:
- Specific Docker or bioconda image that holds the full working environment
- Container available from repository (Biocontainer or alike)
- Forum for user requests and bug reports
Implementation of workflow containers and individual protocols
Testing of reproducibility of the described tasks
Define final format and discuss way of evaluation for future protocol submissions

Benefits
Individual: Dissemination of your work or your favorite workflows
Community: Lower the bar for using proteomics software

Technical details

Software developers and data analysts with experience in setting up workflows are very welcome.

Experience in html, javascript/php and setup of software containers will be of advantage

Contact information

Johannes Griss
Department of Medicine I,
Medical University of Vienna,
Vienna, Austria
[email protected]

Veit Schwämmle
Department of Biochemistry and Molecular Biology
University of Southern Denmark
Denmark
[email protected]

Third-party tool integration and method development in OpenMS

Abstract

OpenMS is a C++ library and framework for computational mass spectrometry. At times, the large code-base (and C++) might overwhelm novel developers. This project is intended to give a gentle introduction on how to find your way through the OpenMS codebase, how to implement your own tools and algorithms, or how to integrate novel third-party tools to interact in complex workflows. We also show how Python (pyOpenMS) bindings are provided for novel algorithms.

Work plan

Basics:

We will first get acquainted with the structure of OpenMS, the graphical viewers, and the OpenMS command line tools.
How OpenMS is far more than a collection of algorithms and tools. It is also service that provides a development infrastructure for the computational MS community
Then we will give a brief overview of the development process in OpenMS, some background information on the build and test infrastructure and good practices.
We then briefly show how command line tools are automatically integrated into workflow systems.
Break for questions and answers like licenses, authorship, etc.

Hands-on:

Rest of the workshop will be in a hands-on fashion where we:

Show how to write your own tool
Integrate third-party tools (potential candidates could be: PeptideShaker, novor, direct tag, MS fragger, your favorite tool)
Show how to build a small benchmarking workflow

Technical details

We encourage participants to build OpenMS from GitHub prior to the workshop.
Github: https://github.com/OpenMS/OpenMS
Gitter chat: https://gitter.im/OpenMS/OpenMS
homepage: www.OpenMS.de

Used programming languages:

C++ (primarily)
python

Contact information

Timo Sachsenberg
University of Tübingen
[email protected]

The OpenMS hackathon is partly supported by the German Network for Bioinformatic Infrastructure (de.NBI):

Expansion of open-source tools handling IMS data.

Abstract

Data Independent Acquisition (DIA) creates a “permanent record of everything” that is ideal for reanalysis, arguably the primary goal of public data repositories.

Within ProteomeXchange, Waters' Synapt is among the top 10 most used instruments. This instrument allows for a DIA technique called HDMSE. Herein, ions are separated in seconds by LC, in milliseconds by Ion Mobility Separation (IMS) and in nanoseconds by MS. Thus, detected ions have four dimensions: retention time, drift time, m/z and intensity. Each MS scan is followed by a single MSMS scan without any precursor selection and fragments can be assigned to precursors based on retention and drift time profiles.

Unfortunately, few software packages exist that can handle this data format. Examples are the open-source packages ISOQuant and SynapteR and the commercial Progenesis QIP software, but all three rely on data preprocessing with Waters' commercial PLGS software. While ProteoWizard can convert raw Waters data to mzML, most (open-source) tools such as OpenMS and TPP are currently unable of fully handling the IMS dimension in mzML, let alone aligning fragments to precursors. In conclusion, even basic tools are lacking to start building an open-source workflow that allows (re)analysis of Waters' HDMSE data.

Technical details

Project outcome
- Create a Peak Picker that converts an IMS-enhanced profile mzML to an IMS-enhanced centroided mzML
- Create a Feature Finder that deisotopes an IMS-enhanced centroided mzML and finds features.
Required familiarity
- Python
- mzML format
Useful familiarity
- ProteoWizard
- OpenMS or TPP
- PLGS
- ISOQuant or SynapteR or Progenesis QIP
- ProteomeXchange
- HDMSE and/or Waters raw format
- Peptide3D and/or Apex3D algorithms
Dataset
- PXD001240 for final testing
- A smaller technical dataset for developmental purposes can be provided by the Laboratory of Pharmaceutical Biotechnology

Contact information

Sander Willems
Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, 9000, Belgium
[email protected]

Improve the quantitative analysis of post-translationally modified peptides by dedicated statistical modeling

Abstract

The characterization of post-translational modifications (PTMs) by bottom-up mass spectrometry is crucial to describe cellular processes. However, current approaches are hampered by the lack of suitable statistical workflows to precisely determine relative PTM changes. Compared to protein quantification where multiple peptide measurements are available per protein and erroneous measurements can be discarded or down-weighted, PTM quantification relies merely on one or few peptide-spectrum matches. Additionally, one needs to distinguish changes of protein quantity from PTM modulation by correcting for the differential protein abundance. This doesn’t account for differences of variability of modified peptides and proteins nor the uncertainty of the relative protein abundance estimate. Finally, although peptide-spectrum matches corresponding to the same PTM often present distinct charge states, miscleavages and combinatorial modification, there is no agreement in the field on how to group them before statistical analysis.
We aim to develop/implement an ion-centric framework to quantify protein modifications while correctly accounting for the different levels of variability. Simultaneously modelling all peptides of a protein will allow a direct estimation of the PTM regulation that is adjusted to differential protein abundance and accounts for the estimated uncertainty.
The results will be presented as guidelines on the proteomics-academy.org website and through publication.

Work plan

Discussion of featured data types and selection of a set of datasets suitable for testing and benchmarking
Define general guidelines for the analysis of PTMomics data
Outline statistical frameworks for ion-centric inference
Implementation of ion-centric models
Evaluate the approaches on a quantitative benchmark set
Summarize guidelines and results

Benefits

Participants will gain expertise in the state-of-the-art models for protein quantification and have influence on future data analysis standards
The project will provide guidelines for the analysis of modified peptides, an essential step for making PTMomics data more reproducible and comparable.

Technical details

Programming language is preferably R
Existing workflows for protein-level inference will be adapted towards ion-centric inference.
A suitable example data set with ground truth (http://www.nature.com/nbt/journal/v35/n8/full/nbt.3908.html) and selected public and in-house data sets will be available.

Contact information

Veit Schwämmle
Department of Biochemistry and Molecular Biology
University of Southern Denmark
Denmark
[email protected]

Lieven Clement
Department of Applied Mathematics, Computer Science and Statistics
Ghent University
Belgium
[email protected]

Marie Locard-Paulet
National Center for Scientific Research
Institute of Pharmacology and Structural Biology
France
[email protected]

eubic / eubic18 Goto Github PK

eubic18's Introduction

EuBIC 2018 developer's meeting

9–12 January 2018, Ghent, Belgium

Project proposal submission

How to submit a project proposal?

How to contribute?

eubic18's People

Contributors

Stargazers

Watchers

eubic18's Issues

Abstract

Work plan

Technical details

Contact information

Abstract

Work plan

Basics

Aims

Technical details

Contact information

Abstract

Work plan

Technical details

Main programming language

Supporting languages

Existing software:

Contact information

Abstract

Work plan

Technical details

Contact information

Abstract

Work plan

Basics:

Hands-on:

Technical details

Used programming languages:

Contact information

Abstract

Technical details

Contact information

Abstract

Work plan

Technical details

Contact information

Recommend Projects

Recommend Topics

Recommend Org