Giter Site home page Giter Site logo

psmc's Introduction

This software package infers population size history from a diploid sequence
using the Pairwise Sequentially Markovian Coalescent (PSMC) model. The
detailed model is described in file `psmc.tex'.

To compile the binaries, you may run

    make; (cd utils; make)

After that, you may try

    utils/fq2psmcfa -q20 diploid.fq.gz > diploid.psmcfa
    psmc -N25 -t15 -r5 -p "4+25*2+4+6" -o diploid.psmc diploid.psmcfa
    utils/psmc2history.pl diploid.psmc | utils/history2ms.pl > ms-cmd.sh
    utils/psmc_plot.pl diploid diploid.psmc

where `diploid.fq.gz' is typically the whole-genome diploid consensus sequence
of one human individual, which can be generated by, for example:

    samtools mpileup -C50 -uf ref.fa aln.bam | bcftools view -c - \
      | vcfutils.pl vcf2fq -D 100 | gzip > diploid.fq.gz

Program `fq2psmcfa' transforms the consensus sequence into a fasta-like format
where the i-th character in the output sequence indicates whether there is at
least one heterozygote in the bin [100i, 100i+100).

Program `psmc' infers the population size history. In particular, the `-p'
option specifies that there are 64 atomic time intervals and 28 (=1+25+1+1)
free interval parameters. The first parameter spans the first 4 atomic time
intervals, each of the next 25 parameters spans 2 intervals, the 27th spans 4
intervals and the last parameter spans the last 6 time intervals. The `-p' and
`-t' options are manually chosen such that after 20 rounds of iterations, at
least ~10 recombinations are inferred to occur in the intervals each parameter
spans. Impropriate settings may lead to overfitting. The command line in the
example above has been shown to be suitable for modern humans.

The `psmc' program infers the scaled mutation rate, the recombination rate and
the free population size parameters. All these parameters are scaled to 2N0. You
may run `psmc2history.pl' combined with `history2ms.pl' to generate the ms
command line that simulates the history inferred by PSMC, or visualize the result
with `psmc_plot.pl'.

To perform bootstrapping, one has to run splitfa first to split long chromosome
sequences to shorter segments. When the `-b' option is applied, psmc will then
randomly sample with replacement from these segments. As an example, the
following command lines perform 100 rounds of bootstrapping:

    utils/fq2psmcfa -q20 diploid.fq.gz > diploid.psmcfa
	utils/splitfa diploid.psmcfa > split.psmcfa
    psmc -N25 -t15 -r5 -p "4+25*2+4+6" -o diploid.psmc diploid.psmcfa
	seq 100 | xargs -i echo psmc -N25 -t15 -r5 -b -p "4+25*2+4+6" \
	    -o round-{}.psmc split.fa | sh
    cat diploid.psmc round-*.psmc > combined.psmc
	utils/psmc_plot.pl -pY50000 combined combined.psmc

One probably wants to modify the "xargs" command-line to parallelize PSMC.

psmc's People

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.