ekg / mutatrix Goto Github PK
View Code? Open in Web Editor NEWgenome simulation across a population with zeta-distributed allele frequency, snps, insertions, deletions, and multi-nucleotide polymorphisms
License: MIT License
genome simulation across a population with zeta-distributed allele frequency, snps, insertions, deletions, and multi-nucleotide polymorphisms
License: MIT License
==== MUTATRIX ==== mutatrix is a population genome simulator which generates simulated genomes. It reads a reference FASTA file and outputs a VCF description of the variants on stdout, and writes each simulated, mutated copy of the reference to the current directory or a user-defined path (--file-prefix). Example usage: % ./mutatrix -S sample -P test/ -p 2 -n 10 reference.fasta This command writes VCF to stdout and writes mutated references to test/, with this format: # <prefix>/<sample id>:<fasta sequence name>:<copy number>.fa % ls test sample10:seq_1:0.fa sample1:seq_1:0.fa sample2:seq_1:0.fa ... sample10:seq_1:1.fa sample1:seq_1:1.fa sample2:seq_1:1.fa ... mutatrix is suitable for use in testing pooled variant detectors, as it distributes alleles throughout the population according to a zeta distribution, which is roughly consistent with the power-law allele frequency spectrum observed by large population sequencing projects like the 1000 Genomes Project. Alternate allele generation: mutatrix generates alleles using the following model: At each position in the reference, we draw a pseudorandom number on [0,1). If this number, scaled by the number of copies of the genome in the population, is below --rate (default 0.001), then we generate an alternate minor allele. We then sample a second number, and if it is below --indel-snp-ratio, we generate an indel. Otherwise, we generate a SNP or MNP. MNPs are generated using a geometric distribution conditioned on the --mnp-ratio. A 2bp MNP occurs at 0.01 the rate of SNPs, a 3bp MNP occurs at 0.01 the rate of 2bp MNPs, etc. Indels are generated by obtaining a length from a zeta distribution with alpha --indel-alpha. (An alpha of 1.7 is used per observations in [1]). If the indel is longer than --indel-max, we continue without generating the indel. Novel insertions are randomly generated. Allele frequency spectrum simulation: Once generated, the alternate allele is distributed across the population of simulated individuals by sampling an allele frequency from a zeta distribution (also with alpha 1.7). The alternate alleles are randomly distributed across the population. There is no concept of haplotype block or linkage in mutatrix. Each allele and site is effectively independent from other sites. Dependency on vcflib You'll need to be able to build vcflib. This might involve installing libtabixpp-dev. author: Erik Garrison <[email protected]> license: MIT (free) references: [1] Problems and Solutions for Estimating Indel Rates and Length Distributions. Reed A. Cartwright. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2734402/
Hi,
when compiling on MAC OS
using Homebrew g++ 4.9.3
I run into the following error:
cd fastahack && /Applications/Xcode.app/Contents/Developer/usr/bin/make
g++ -c Fasta.cpp
In file included from Fasta.h:19:0,
from Fasta.cpp:9:
LargeFileSupport.h:12:9: error: '__off64_t' does not name a type
typedef __off64_t off_type;
^
I tried replacing the fastahack
folder by the version of vg
, but then I run into another issue:
g++ mutatrix.cpp \
fastahack/Fasta.o -o mutatrix -Ivcflib/src/ -Ivcflib/ -L. -Lvcflib/tabixpp/ -ltabix -Lvcflib/ -lvcflib -lm -lz -std=c++0x
Undefined symbols for architecture x86_64:
"repeatCounts(long, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)", referenced from:
_main in ccBqFQjn.o
Any help would be appreciated.
Hi Erik,
In order to compile mutatrix, I had to change two lines in vcflib/fsom/fsom.c
to avoid a compiler error. The change made is following:
OLD: dist_i = abs(x - i) ;
NEW: dist_i = abs( static_cast<int>(x - i) );
Why an error occurred and change was needed is explained here: https://stackoverflow.com/a/50301673
I git cloned under the folder and "make" to complie, but failed. How can I install this?
cd vcflib && make libvcflib.a
make[1]: Entering directory /home/sp/Tools/mutatrix/vcflib' make[1]: *** No rule to make target
libvcflib.a'. Stop.
make[1]: Leaving directory `/home/sp/Tools/mutatrix/vcflib'
make: *** [vcflib/libvcflib.a] Error 2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.