Giter Site home page Giter Site logo

svmu's Introduction

svmu

SVMU (Structural Variants from MUmmer) attempts to identify comprehensive sequence variants via alignment of two contiguous genome assemblies. It combines the strengths of two powerful aligners MUMmer and LASTZ to annotate duplicates, large indels, inversions, small indels, SNPs from whole genome alignments.

It is still under active development. We are incorporating new features and fixing bugs. One experimental feature we have added is processing of LASTZ output in svmu. If you encounter an issue, feel free to email me at [email protected]. SVMU works with both MUMmer v3 and MUMmer v4. Support for LASTZ is experimental.

NOTE: Feel free to try the latest version but be cautious with the results. If you have an idea or suggestion (including collaboration ideas), write to me. However, if you are coming here looking for the svmu versions used in the A4 and DSPR papers, see below:

If you publish results obtained with this pipeline, please cite SVMU as described here https://www.nature.com/articles/s41467-019-12884-1. The version used in the paper are available through commits prior to March 6,2018.

  1. Download and compile the programs -
   make

  1. Obtain the mummer and lastz alignments:
   nucmer --threads n --prefix sam2ref ref.fasta sample.fasta

   lastz ref.fasta[multiple] sample.fasta[multiple] --chain --format=general:name1,strand1,start1,end1,name2,strand2,start2,end2 > sam_lastz.txt
   
(LASTZ output that svmu reads should have only six columns as mentioned in the lastz command above)
(for relatively small genomes, --maxmatch can also be used for nucmer)
  1. Run svmu:
   svmu sam2ref.mm.delta ref.fasta sample.fasta snp_mode sam_lastz.txt prefix 

snp_mode should be 'h' or 'l'. h = report SNPs; l = no SNPs. currently, this option is turned off [will be activated in near future].

prefix = a prefix that will be added to the output files.

sv.prefix.txt = A tab delimited file that summarizes structural mutations (indels, CNVs, inversions) in the sample genome with respect to the reference genome.  

small.prefix.txt: A tab delimited file containing SNPs and small indels that occur within syntenic blocks (or MUMs).

cnv_all.prefix.txt: A tab delimited file with all the reference genomic regions that are present in higher copy numbers (>1) in the sample genome. Those with "trans" in their names mean either it is a transposable element or non-TE copies of a gene in different chromosomes.

cm.prefix.txt: A bed file with the reference genomic regions that are syntenic between the two genomes. 

Finally, If you are using SVMU for your research, please keep in mind that SVMU has not been extensively tested on genomes bigger than Drosophila. So there is no gurantee that it will work well with other genomes. Currently it requires ~2.5G memory for the D. melanogaster genome.

KNOWN BUGS/PLANNED FUTURE IMPROVEMENTS:

  1. SVMU currently reports the inversion breakpoints and may not report the length of the inversion. Do inspect the reported inversions before you trust them fully. A future fix will take care of this issue.
  2. White space in fasta headers will cause segfault in svmu because nucmer strips all text following white space or tab present in the fasta headers.
  3. Translocated segments may show up as large indels.
  4. A lastz mode is experimental.
  5. a sam to delta format converter is coming soon. That way, svmu will be compatible with minimap2.

svmu's People

Contributors

mahulchak avatar youreprettygood avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.