Giter Site home page Giter Site logo

emlec / ssv-conta Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 16 KB

SSV-Conta quantify and characterize DNA contaminants from gene therapy vector sequenced by Illumina system.

License: GNU General Public License v2.0

python aav dna-contaminants gene-therapy

ssv-conta's Introduction

SSV-Conta

Package containing all scripts needed to quantify and characterize DNA contaminants from gene therapy vector production after NGS sequencing.

Principle

The five scripts used are :

  • Quade : Fastq files demultiplexer, handling double indexing, molecular indexing and filtering based on index quality (python 2.7)
  • Sekator : Multithreaded quality and adapter trimmer for PAIRED fastq files (Python2.7/Cython/C)
  • fastq_control_sampler : Generate control FASTQ files (C)
  • RefMasker : Hard mask homologies between fasta reference sequences identified by Blastn (python 2.7)
  • ContaVect : Quantify and characterize DNA contaminants (python 2.7)

Get SSV-Conta

Clone the repository SSV-Conta

git clone --recurse-submodules URL

Detailed information concerning the installation of Quade, Sekator, RefMasker and ContaVect is available in each README. For ContaVect, install a version of pysam < v0.13.0.

Make a link to bin

Usage

Pre-processing of sequencing reads

Input : chunks of non demultiplexed raw fastq files

  • In the folder where fastq files will be created, create the template : Quade.py -i After filling, run Quade : Quade.py -c Quade_conf_file.txt

Be careful : All the chunks path should be separated by tab or space. The version of this report doesn't include the Undetermined in the count of the pair passed and failed quality.

It can run several hours.

Output : Fastq demultiplexed (filtering based on index quality) + Quade_report

  • In the folder where fastq files will be created, create the template : Sekator.py -i After filling, run Sekator : Sekator.py -c Sekator_conf_file.txt

Be careful : If necessary create the library AdapterTrimmer.so required for the adapter trimming step : python setup.py build_ext --inplaceand then make clean

Output : Fastq trimmed

Create the FASTQ files of the in silico control

  • Run the program fastq_control_sampler

Mapping reads to reference sequences

  • In the folder analysis, copy and fill the template present directly in ContaVect.

Run ContaVect.py Conf.txt

Authors and Contact

Adrien Leger [email protected] @a-slide
Emilie Lecomte [email protected] @emlec

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.