Giter Site home page Giter Site logo

irisrizos / unassigned_protists_ssn Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 1.0 450.06 MB

License: MIT License

R 42.86% Shell 14.15% Python 42.99%
biogeography clustering igraph-networks metabarcoding parasites protists ssn taxonomy temporal-data unassigned

unassigned_protists_ssn's Introduction

Beyond the limits of the unassigned protist microbiome: inferring large-scale spatio-temporal patterns of Syndiniales marine parasites

Exploration of taxinomically unassigned protist genetic barcode sequences across different V4-18S datasets.

The goal of this study is to:

  • Describe the overall proportion of protist sequences which lack taxinomic assignment in PR2 reference database
  • Identify protist lineages that are the least described across 6 metabarcoding datasets (including open-sea campaigns and coastal time-series)
  • Reveal wide scale geographic distribution patterns of parasitic dinoflagellates (i.e. Syndiniales) that are unassigned at the genus level (i.e. 98% of Syndiniales sequences!)
  • Highlight a list of recurrent (through at least 7 years of data) community-indicator parasite taxa to be prioritized for identification

By:

  • Integrating diverse metabarcoding datasets into a global dataset with an homogenised taxonomy
  • Clustering the gathered metabarcode sequences together based on sequence similarity with a Sequence Similarity Network, allowing to adress integrated ecological ascpects of clusters at low taxonomic resolution independent of their lack of taxonomic assignment
  • Exploring spatial and temporal patterns of unassigned sequence clusters at a selected taxonomic level and for a selected protist group

Graphical

Abstract

Marine protists are major components of the oceanic microbiome that remain largely unrepresented in culture collections and genomic reference databases. The exploration of this uncharted protist diversity in oceanic communities relies essentially on the study of genetic markers as taxonomic barcodes. Nevertheless, we report that across 6 environmental planktonic surveys, ½ of genetic barcodes remain taxonomically unassigned at the genus level, limiting the understanding of the ecological implications of many protist lineages. Among them, parasitic Dinoflagellata (i.e. Syndiniales) appear as the least described protist group while being key actors in marine food webs at a global scale. We have developed a FAIR computational workflow integrating diverse metabarcoding datasets, in order to infer large scale ecological patterns at a fine-grained taxonomic resolution, bypassing the limitation of taxonomic assignment. We reveal novel geographic distribution patterns for unassigned Syndiniales genera including sequences shared between disconnected marine photic zones and ubiquitous Syndiniales sequences. From a temporal aspect, we have pinpointed recurrent and seasonally persistent parasite taxa that are also indicative of community dynamics, withholding a potential for ecosystem monitoring. Our results underline the importance of Syndiniales in structuring planktonic communities through space and time, raising questions regarding host-parasite association specificity and the trophic mode of persistent Syndiniales, while providing an innovative framework for prioritizing unassigned protist taxa for further description.

Datasets

Open Sea Campaigns:

Coastal European Sampling Project:

Time-series:

Prerequisites

Data

Below are listed the data needed to re-run analysis included in this study. The README of each folder will guide you to the adequate scripts.

  • Global homogenised V4-18S metabarcode dataset (343,165 OTUs): /Homogenisation workflow/ASV_all_18SV4_6MetaB.zip

  • Syndiniales network (4,317 CCs): /SSN/Split_Igraph_Synd_id100_cov80.z01-3

Computational ressources

Demanding computations:

  • All-against-all blast: 100 job parallelisation, computation time of 1 week

  • ANOVA on each axis of RDA: 4 CPUs, mem 50GB

  • Escoufier's equivalent vectors: 8-16 CPUs, mem 40-100GB

  • Lomb-Scargle Periodogram algorithm: 8 CPUs, mem 70GB

What about your protist group of interest ?

This protocol can be freely re-used to explore the spatiotemporal patterns of any protist group ! The steps are the following:

  • Download the network including all protist groups:

Folder SSN, files Split_Igraph_all_.z0 (see README of SSN folder)

  • Select your target protist group:

Indicate the taxonomic level and id of the group (e.g. Class==Syndiniales) in script SSN_Synd.Rmd (lines 101-109): this will subtract from the network only the clusters (Connected Components) composed of sequences of your chosen group along with their metadata. N.b. if you are also curious about assigned / unassigned sequences at low taxonomic levels (e.g. genus) within your target group, you can further refine your cluster selection at lines 202-211.

  • Spatiotemporal exploration:

After selecting clusters of your target protist group: For spatial analysis run script /Spatiotemporal Analysis/Spatial_expl.Rmd

For temporal analysis follow guidelines of the /Spatiotemporal Analysis/ folder.

  • V4-18S sequence implementation:

It is also possible to implement your own sequences and see how they clusterise among the SSN. Just note that a computation time of 1 week is required to run the all-against-all alignment with the updated sequence dataset + 1-2h for network creation by Igraph with R. After that, you can catch up the protocol from the network clusterisation step (Script: SSN_Synd.Rmd, line 68).

unassigned_protists_ssn's People

Contributors

irisrizos avatar thomasfinet avatar

Watchers

 avatar

Forkers

thomasfinet

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.