Giter Site home page Giter Site logo

long-chimeric-reads-project's Introduction

Data and scripts for "Enhancing long-read amplicon sequencing: Overcoming chimeric sequence challenges in biodiversity studies with VSEARCH and DADA2" (Hakimzadeh et al. 2024)

Structure

This repository contains the data and part of the analysis stack for the abovementioned paper. It is structured as follows:

Chimeras_denovo holds scripts related to the long read module of the VSEARCH applied for the simulated dataset and statistical part for calculating the F1 score.

DADA2 holds scripts related to DADA2 quality filtering and chimera filtering related to the real dataset. Moreover the scripts for the simulated dataset and statical analysis for calculating F1 score.

Recovery modules contains the BLAST scripts for alignment and recovery module for it. Besides it contains ReChime (v1) module.

  1. started by cutadapt with trimming ITS regions.
  2. Later on, the "ITS.fasta" file was inserted for the simulation with SimLoRD.
  3. Later on, quality filtering was applied to reads with the PipeCarft-VSEARCH module, and the script was:
vsearch --fastq_filter input_file --fastq_maxee 1 --fastq_maxns 0  --fastq_minlen 50 --threads 8 --fastq_qmax 93 --fastq_qmin 0   --fastqout /input/qualFiltered_out/output_file.fastq
  1. The fastq reads converted to the Fasta ones.
  2. Later on ITSx was applied to the data.
  3. After the ITSx, The full and partial one was selected from ITSx outputs.
  4. Chimera simulator will be applied to get chimeric read out of these Fasta reads.
  5. After this chimeric reads were generated they were concatenated into one file.
  6. UCHIME reference-based chimera filtering ran over the files.

Best settings

The best settings that Uchime achieved were with settings --mindiv 0.4 --dn 1.6 --minh 0.08.

treat chimeras

we can run it by this script:

python chimera_recovery.py --blast_output blast_1st_tophit.txt --output recovered.csv --min_occurrence 2 --input_fasta combined_chimeras.fasta

long-chimeric-reads-project's People

Contributors

alihkz94 avatar

Watchers

 avatar Sten Anslan avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.