Giter Site home page Giter Site logo

ekg / mutatrix Goto Github PK

View Code? Open in Web Editor NEW
15.0 6.0 5.0 44 KB

genome simulation across a population with zeta-distributed allele frequency, snps, insertions, deletions, and multi-nucleotide polymorphisms

License: MIT License

C 11.29% C++ 74.30% Shell 12.79% Makefile 1.62%

mutatrix's Introduction

==== MUTATRIX ====

mutatrix is a population genome simulator which generates simulated genomes.

It reads a reference FASTA file and outputs a VCF description of the variants
on stdout, and writes each simulated, mutated copy of the reference to the
current directory or a user-defined path (--file-prefix).


Example usage:

    % ./mutatrix -S sample -P test/ -p 2 -n 10 reference.fasta

This command writes VCF to stdout and writes mutated references to test/, with
this format:

    # <prefix>/<sample id>:<fasta sequence name>:<copy number>.fa

    % ls test
    sample10:seq_1:0.fa  sample1:seq_1:0.fa  sample2:seq_1:0.fa  ...
    sample10:seq_1:1.fa  sample1:seq_1:1.fa  sample2:seq_1:1.fa  ...

mutatrix is suitable for use in testing pooled variant detectors, as it
distributes alleles throughout the population according to a zeta distribution,
which is roughly consistent with the power-law allele frequency spectrum
observed by large population sequencing projects like the 1000 Genomes Project.


Alternate allele generation:

mutatrix generates alleles using the following model:

At each position in the reference, we draw a pseudorandom number on [0,1).  If
this number, scaled by the number of copies of the genome in the population, is
below --rate (default 0.001), then we generate an alternate minor allele.

We then sample a second number, and if it is below --indel-snp-ratio, we
generate an indel.  Otherwise, we generate a SNP or MNP.  MNPs are generated
using a geometric distribution conditioned on the --mnp-ratio.  A 2bp MNP
occurs at 0.01 the rate of SNPs, a 3bp MNP occurs at 0.01 the rate of 2bp MNPs,
etc.

Indels are generated by obtaining a length from a zeta distribution with alpha
--indel-alpha.  (An alpha of 1.7 is used per observations in [1]).  If the
indel is longer than --indel-max, we continue without generating the indel.
Novel insertions are randomly generated.


Allele frequency spectrum simulation:

Once generated, the alternate allele is distributed across the population of
simulated individuals by sampling an allele frequency from a zeta distribution
(also with alpha 1.7).  The alternate alleles are randomly distributed across
the population.

There is no concept of haplotype block or linkage in mutatrix.  Each allele and
site is effectively independent from other sites.


Dependency on vcflib

You'll need to be able to build vcflib. This might involve installing libtabixpp-dev.


author: Erik Garrison <[email protected]>
license: MIT (free)

references:

[1] Problems and Solutions for Estimating Indel Rates and Length Distributions.
Reed A. Cartwright.  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2734402/

mutatrix's People

Contributors

ekg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mutatrix's Issues

fastahack compile error on MAC OS

Hi,

when compiling on MAC OS using Homebrew g++ 4.9.3 I run into the following error:

cd fastahack && /Applications/Xcode.app/Contents/Developer/usr/bin/make
g++ -c Fasta.cpp
In file included from Fasta.h:19:0,
                 from Fasta.cpp:9:
LargeFileSupport.h:12:9: error: '__off64_t' does not name a type
 typedef __off64_t off_type;
         ^

I tried replacing the fastahack folder by the version of vg, but then I run into another issue:

g++ mutatrix.cpp \
		fastahack/Fasta.o -o mutatrix -Ivcflib/src/ -Ivcflib/ -L. -Lvcflib/tabixpp/ -ltabix -Lvcflib/ -lvcflib -lm -lz -std=c++0x
Undefined symbols for architecture x86_64:
  "repeatCounts(long, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)", referenced from:
      _main in ccBqFQjn.o

Any help would be appreciated.

Error compiling with gcc-10.1.0

Hi Erik,

In order to compile mutatrix, I had to change two lines in vcflib/fsom/fsom.c to avoid a compiler error. The change made is following:

OLD: dist_i = abs(x - i) ;
NEW: dist_i = abs( static_cast<int>(x - i) );

Why an error occurred and change was needed is explained here: https://stackoverflow.com/a/50301673

Installation

I git cloned under the folder and "make" to complie, but failed. How can I install this?

cd vcflib && make libvcflib.a
make[1]: Entering directory /home/sp/Tools/mutatrix/vcflib' make[1]: *** No rule to make targetlibvcflib.a'. Stop.
make[1]: Leaving directory `/home/sp/Tools/mutatrix/vcflib'
make: *** [vcflib/libvcflib.a] Error 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.