Giter Site home page Giter Site logo

asleonard / jupiterplot Goto Github PK

View Code? Open in Web Editor NEW

This project forked from justinchu/jupiterplot

0.0 0.0 0.0 81.54 MB

A Circos-based tool to visualize genome assembly consistency

License: GNU General Public License v3.0

Perl 91.85% Makefile 0.94% Shell 0.61% HTML 5.03% Roff 1.48% Python 0.09%

jupiterplot's Introduction

Circos Assembly Consistency (Jupiter) plot

DOI

This is a pipeline for generating a Circos based genome assembly consistency plot given a set of contigs relative to reference genome. Intended to visualize large scale translocations or missassemblies in draft assemblies, but it can also be useful when trying to show synteny between whole genomes.

Generates plots similar to those found in the ABySS 2 and Tigmint papers, given only a reference genome fasta file and an assembly scaffolds fasta file. Good for getting a quick qualitative view of the missassemblies in a genome assembly. Nicknamed after the type of plot you get if your assembly is relatively error free (looks like the planet Jupiter). If you think of features or some useful auxiliary information (e.g. for tracing back where a translocation event occurred) that the plot uses that I could report better let me know.

Example plot on a Drosophila assembly showing a misassembly (or possible chromosomal fusion event) between L2 and L3. This event is inverted relative to the orientation show in the plot so the link is displayed as twisted. There are also smaller events internal to 3R. Note that by default only large scale events (>10kb) can be see in this plot, and small misassemblies, possibly medidated by repeats, cannot be seen (unless maxBundleSize is changed). The black lines on the reference indicate gaps of Ns, which can explain why some regions of the assembly are not covered (often found in telomeric or centromeric regions). Default settings except for ng=80 used.

Requirements (for full pipeline):

Installing Circos

Circos can be installed with linuxbrew using the command:

brew install circos

Perl Modules Needed (If downloading and installing Circos without a package manager):

Starting inputs:

  • Set of scaffolds in fasta format
  • Reference genome in fasta format

To generate a plot given these inputs, minimap2 must be in your path.

Usage:

Simply run:

jupiter name=$prefix ref=$reference fa=$scaffolds

Optional commands:

sam=                #Specify an existing alignment of scafftigs to if they already exist (naming convention that fatoagp.pl produces must be consistent)
<<<<<<< HEAD
ng=75               #use largest scaffolds that are equal to 75% of the genome 
maxScaff=           #by default uses ng value, but can override to allow a maximum of N scaffolds
maxGap=100000       #maximum alignment gap allowed to consider a region contiguous
minBundleSize=50000 #minimum size of a contiguous region to render
=======

#General Parameters
t=4                 #number of threads to use for minimap2

#Karyotype options
>>>>>>> 724f2bc3c00652754b00e66a6cd08f78d4e677de
m=100000            #only use genomic reference chromosomes larger than this value
ng=75               #use largest scaffolds that are equal to 75% of the genome 
maxScaff=-1         #Instead of ng filter by this number of scaffolds
i=0                 #increment for colouring chromosomes (HSV colour shift by setting 0-360), when set to >360 it generates random colours
g=1                 #minimum gap size in reference to render
gScaff=100000       #minimum gap size in scaffolds to render
labels = ref        #Shows reference chromosome name "ref", scaffolds "scaf" or "both".

#Link options
maxGap=100000       #maximum alignment gap allowed to consider a region contiguous
minBundleSize=50000 #minimum size of a contiguous region to render
MAPQ=50             #maximum mapping quality allowed when filtering
linkAlpha=5         #alpha of links 1 = 17% , 2 = 33%, 3 = 50%, 4 = 67% and 5 = 83%.

If everything runs smoothly it will generate the following files:

prefix_scaffolds.fa (symlink)
prefix_reference.karyotype
prefix_reference.fa.sa
prefix_reference.fa.pac
prefix_reference.fa.ann
prefix_reference.fa.amb
prefix_reference.fa (symlink)
prefix.seqOrder.txt
prefix.svg
prefix-agp.sam
prefix.links
prefix.karyotype
prefix-agp.fa
prefix.conf
prefix.agp

Most likely, you will want to work with the svg file as the perl image processing module Circos uses has difficulty rendering transparency on png files.

Relating image back to assembly:

prefix.seqOrder.txt file: This file helps to relate the mapping order of each scaffold rendered back to the reference chromosomes in the plot. It lists what scaffold is rendered in the order to each reference chromosome from top of the plot to the bottom.

Here is the format:

internalRefID	refID	internalScaffoldID	scaffoldID strand

For example:

ref5	X	scaf109	361740	-
ref5	X	scaf120	361915	+
ref5	X	scaf68	361536	+
...

Chromosome X matches scaffold 361740 as the first segment (in the - direction relative to the orignal sequence) then scaffold 361915 (in the + direction relative to the orignal sequence) occurs as the next alignment (ordered in the 5'->3' direction relative to the reference).

Tips and Tricks:

  1. Renaming Chromosomes You can relabel chromosomes by running to the prefix.karyotype stage, editing the file and run the rest of the pipeline:
#Run pipeline to prefix.karyotype stage
jupiter name=prefix $OPTS prefix.karyotype
#Edit the file prefix.karyotype file in some way
sed -i 's/chromosome/chr/g' prefix.karyotype
#Finish pipeline
jupiter name=prefix $OPTS

In this example, if the labels had chromosome1 and they were then changed to chr1.

Possible issues:

  1. Error regarding too many ideograms - Example:
You have asked to draw [831] ideograms, but the maximum is currently set at
  [500]. To increase this number change max_ideograms in etc/housekeeping.conf.
  Keep in mind that drawing that many ideograms may create an image that is too
  busy and uninterpretable.

This basically means the assembly may be too fragmented. You can decrease ng to smaller value or alter the housekeeping.conf to allow for more scaffolds to render (keeping in mind it may become quite unwieldy).

Acknowledgements

  • Members of the BTL lab at the BCGSC for testing and feedback.
  • ckeeling for helping with identifying required perl modules.
  • harish0201 for helping with adding minimap2 functionality.

jupiterplot's People

Contributors

justinchu avatar harish0201 avatar aafshinfard avatar asleonard avatar jwcodee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.