haolong / psmc Goto Github PK
View Code? Open in Web Editor NEWThis project forked from lh3/psmc
This project forked from lh3/psmc
This software package infers population size history from a diploid sequence using the Pairwise Sequentially Markovian Coalescent (PSMC) model. The detailed model is described in file `psmc.tex'. To compile the binaries, you may run make; (cd utils; make) After that, you may try utils/fq2psmcfa -q20 diploid.fq.gz > diploid.psmcfa psmc -N25 -t15 -r5 -p "4+25*2+4+6" -o diploid.psmc diploid.psmcfa utils/psmc2history.pl diploid.psmc | utils/history2ms.pl > ms-cmd.sh utils/psmc_plot.pl diploid diploid.psmc where `diploid.fq.gz' is typically the whole-genome diploid consensus sequence of one human individual, which can be generated by, for example: samtools mpileup -C50 -uf ref.fa aln.bam | bcftools view -c - \ | vcfutils.pl vcf2fq -D 100 | gzip > diploid.fq.gz Program `fq2psmcfa' transforms the consensus sequence into a fasta-like format where the i-th character in the output sequence indicates whether there is at least one heterozygote in the bin [100i, 100i+100). Program `psmc' infers the population size history. In particular, the `-p' option specifies that there are 64 atomic time intervals and 28 (=1+25+1+1) free interval parameters. The first parameter spans the first 4 atomic time intervals, each of the next 25 parameters spans 2 intervals, the 27th spans 4 intervals and the last parameter spans the last 6 time intervals. The `-p' and `-t' options are manually chosen such that after 20 rounds of iterations, at least ~10 recombinations are inferred to occur in the intervals each parameter spans. Impropriate settings may lead to overfitting. The command line in the example above has been shown to be suitable for modern humans. The `psmc' program infers the scaled mutation rate, the recombination rate and the free population size parameters. All these parameters are scaled to 2N0. You may run `psmc2history.pl' combined with `history2ms.pl' to generate the ms command line that simulates the history inferred by PSMC, or visualize the result with `psmc_plot.pl'. To perform bootstrapping, one has to run splitfa first to split long chromosome sequences to shorter segments. When the `-b' option is applied, psmc will then randomly sample with replacement from these segments. As an example, the following command lines perform 100 rounds of bootstrapping: utils/fq2psmcfa -q20 diploid.fq.gz > diploid.psmcfa utils/splitfa diploid.psmcfa > split.psmcfa psmc -N25 -t15 -r5 -p "4+25*2+4+6" -o diploid.psmc diploid.psmcfa seq 100 | xargs -i echo psmc -N25 -t15 -r5 -b -p "4+25*2+4+6" \ -o round-{}.psmc split.fa | sh cat diploid.psmc round-*.psmc > combined.psmc utils/psmc_plot.pl -pY50000 combined combined.psmc One probably wants to modify the "xargs" command-line to parallelize PSMC.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.