Giter Site home page Giter Site logo

plasma's Introduction

๐Ÿ”ฎ PLASMA

PLASMA (PopuLation Allele-Specific MApping) is a statistical fine-mapping method for functional data using QTL and allelic-imbalance signal.

Preprint for the PLASMA method

Developed at the Gusev Lab at the Dana Farber Cancer Institute / Harvard Medical School.

Installation and Dependencies

PLASMA utilizes Python 2.7 requires the following Python packages for core functionality:

The following packages are optional, but are used for pre/post-processing:

All packages can be installed using Python's pip package manager.

To download PLASMA, click "Clone or Download" or enter:

git clone https://github.com/austintwang/plasma

run_plasma.py : Quick-start fine-mapping script

The run_plasma.py script conducts fine-mapping of a single locus with default PLASMA parameters and outputs.

Input Files and Parameters

The script requires the following files:

  • Two text files (one for each haplotype), specifying the haplotype-specific genotypes, across samples and marker. Each row should represent an individual, and each column should represent a marker. The ordering of samples and markers should be the same for both files.
  • Two text files (one for each haplotype), specifying the haplotype-specific phenotypes (e.g. read counts) across samples. The order of samples should be the same as that of the genotype files.
  • A text file, specifying the total phenotype across samples. If none is provided, then the total phenotype is assumed to be the sum of the haplotype-specific phenotypes.

Other parameters include:

  • Individual-level or global beta-binomial overdispersions.
  • AS-Only and QTL-only modes, where the total phenotype and allele-specific phenotypes are ignored, respectively.
  • Search parameters, including the maximum number of causal variants and the search mode (exhaustive or stochastic shotgun search)
  • The confidence level when creating the credible set

Output Files

The script outputs two files in the specified output directory:

  • cset.txt: The minimal set of markers that contains the set of true causal markers, at the specified confidence level. 1 and 0 denote that a marker is included in and excluded from the credible set, respectively. The order of the markers is the same as that in the genotype files.
  • ppas.txt: The marginal posterior probabilities of each marker being causal.

Usage

Usage of the script is as follows:

usage: run_plasma.py [-h] [--total_exp_path TOTAL_EXP_PATH]
                     [--overdispersion_path OVERDISPERSION_PATH]
                     [--overdispersion_global OVERDISPERSION_GLOBAL]
                     [--as_only] [--qtl_only] [--search_mode SEARCH_MODE]
                     [--max_causal MAX_CAUSAL] [--confidence CONFIDENCE]
                     hap_A_path counts_A_path hap_B_path counts_B_path out_dir

positional arguments:
  hap_A_path            Path to haplotype A genotypes file
  counts_A_path         Path to haplotype A mapped counts file
  hap_B_path            Path to haplotype B genotypes file
  counts_B_path         Path to haplotype B mapped counts file
  out_dir               Path to output directory

optional arguments:
  -h, --help            show this help message and exit
  --total_exp_path TOTAL_EXP_PATH, -t TOTAL_EXP_PATH
                        Path to total QTL phenotype file (Default: Sum of
                        counts files)
  --overdispersion_path OVERDISPERSION_PATH, -o OVERDISPERSION_PATH
                        Path to individual-level AS overdispersion file
                        (Default: Global overdispersion)
  --overdispersion_global OVERDISPERSION_GLOBAL, -g OVERDISPERSION_GLOBAL
                        Global AS overdispersion (Default: 0)
  --as_only, -a         AS-Only Mode
  --qtl_only, -q        QTL-Only Mode
  --search_mode SEARCH_MODE, -s SEARCH_MODE
                        Causal configuration search mode (Default:
                        "exhaustive")
  --max_causal MAX_CAUSAL, -m MAX_CAUSAL
                        Maximum number of causal configurations searched
                        (Default: 1)
  --confidence CONFIDENCE, -c CONFIDENCE
                        Credible set confidence level (Default: 0.95)

PLASMA API

PLASMA additionally has a Python API, which exposes the full feature set of PLASMA. Documentation for the PLASMA Python API is currently in progress.

Features of the API include:

  • Alternative data input formats, including direct use of association statistics
  • User specification of hyperparameters, including heritabilities and correlations between the AS and QTL phenotypes
  • Additional fine-mapping outputs
  • Colocalization analysis across multiple quantitative allele-specific phenotypes
  • An allele-specific simulation framework for quantitative phenotypes
  • Ability to extend PLASMA via subtyping

To see the latest code, check the dev branch.

plasma's People

Contributors

austintwang avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

shicheng-guo

plasma's Issues

AttributeError: 'Finemap' object has no attribute '_calc_counts'

After manual edits to the scripts mentioned in issue number 2, and using the same files as mentioned there, running the following command throws the following error:

./run_plasma.py hap1.txt exp1.txt hap2.txt exp2.txt plasma_out/ --total_exp_path exp_total.txt -o tagwise_dsprsn.txt

 RuntimeWarning: divide by zero encountered in true_divide * (1 + self.overdispersion * (counts - 1))
 Traceback (most recent call last):
   File "./run_plasma.py", line 78, in <module>
run_plasma(args)
   File "./run_plasma.py", line 40, in run_plasma
model.initialize()
   File "/c8000xd3/big-c1477909/fine_mapping/plasma/ase_finemap/finemap.py", line 395, in initialize
self._calc_imbalance_stats()
   File "/c8000xd3/big-c1477909/fine_mapping/plasma/ase_finemap/finemap.py", line 198, in _calc_imbalance_stats
self._calc_imbalance_errors()
   File "/c8000xd3/big-c1477909/fine_mapping/plasma/ase_finemap/finemap.py", line 159, in _calc_imbalance_errors
self._calc_counts()
 AttributeError: 'Finemap' object has no attribute '_calc_counts'

As I am really interested in using this package, it would be great to have some feedback on how to get around these errors, and whether this package is being actively maintained.

Any advice you could offer would be greatly appreciated

Potential bug in _calc_total_exp_errors() function

Hello Austin,

I was going through the code because I wasn't getting a result that I expected and found a potential bug in finemap.py.

For the rest of the code, genotypes centered at 0 (genotype.ctrd) are used to estimate beta, but in _calc_total_exp_errors(), the genotypes are not centered when computing the residuals.

residuals = (
self.total_exp[self.mask_total_exp]
- self._mean
- (self.genotypes_comb[self.mask_total_exp, :] * self.beta).T
).T

Instead, I believe it should be:

genotypes_comb = self.genotypes_comb[self.mask_total_exp, :]
genotype_means = np.mean(genotypes_comb, axis=0)
genotypes_ctrd = genotypes_comb - genotype_means
residuals = (
self.total_exp[self.mask_total_exp]
- self._mean
- (genotypes_ctrd * self.beta).T
).T

I am using the dev version, but I checked that it's the same in the master branch.

Best,

Raehoon.

NameError: global name 'overdispersion_path' is not defined

I have been trying to run the run_plasma.py script with a test set of allele specific data. However, the following error is thrown:

./run_plasma.py hap1_1.txt exp1_1.txt hap2_1.txt exp2_1.txt plasma_out/ --total_exp_path exp_total.txt --overdispersion_path tagwise_dsprsn.txt 
Traceback (most recent call last):
  File "./run_plasma.py", line 77, in <module>
    run_plasma(args)
  File "./run_plasma.py", line 27, in run_plasma
    if overdispersion_path is not None:
NameError: global name 'overdispersion_path' is not defined

As far as I understand it, the overdispersion_path parameter should be optional and if no file is specified the program should default to the global_dispersion parameter - which itself has a default of 0. So the program should run without this parameter.

I get the above error I specify no dispersion parameters, or either of the dispersion parameters i.e. I send a file to -o containing tagwise dispersion estimates (from EdgeR), or if I set -g to a specific value or 0.

I'm not sure how to get round this to get the program running!!!.


Here are my environment details:

_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                      1_llvm    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.1                      py_0    conda-forge
backports_abc             0.5                        py_1    conda-forge
bzip2                     1.0.8                h516909a_2    conda-forge
ca-certificates           2020.4.5.1           hecc5488_0    conda-forge
certifi                   2019.11.28       py27h8c360ce_1    conda-forge
curl                      7.69.1               h33f0ec9_0    conda-forge
cycler                    0.10.0                     py_2    conda-forge
dbus                      1.13.6               he372182_0    conda-forge
enum34                    1.1.10           py27h8c360ce_1    conda-forge
expat                     2.2.9                he1b5a44_2    conda-forge
fontconfig                2.13.1            h86ecdb6_1001    conda-forge
freetype                  2.10.1               he06d7ca_0    conda-forge
functools32               3.2.3.2                    py_3    conda-forge
futures                   3.3.0            py27h8c360ce_1    conda-forge
gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
glib                      2.64.2               h6f030ca_0    conda-forge
gst-plugins-base          1.14.5               h0935bb2_2    conda-forge
gstreamer                 1.14.5               h36ae1b5_2    conda-forge
icu                       64.2                 he1b5a44_1    conda-forge
jpeg                      9c                h14c3975_1001    conda-forge
kiwisolver                1.1.0            py27h9e3301b_1    conda-forge
krb5                      1.17.1               h2fd8d38_0    conda-forge
ld_impl_linux-64          2.34                 h53a641e_0    conda-forge
libblas                   3.8.0               16_openblas    conda-forge
libcblas                  3.8.0               16_openblas    conda-forge
libclang                  9.0.1           default_hde54327_0    conda-forge
libcurl                   7.69.1               hf7181ac_0    conda-forge
libdeflate                1.0                  h14c3975_1    bioconda
libedit                   3.1.20170329      hf8c457e_1001    conda-forge
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc-ng                 9.2.0                h24d8f2e_2    conda-forge
libgfortran-ng            7.3.0                hdf63c60_5    conda-forge
libiconv                  1.15              h516909a_1006    conda-forge
liblapack                 3.8.0               16_openblas    conda-forge
libllvm9                  9.0.1                he513fc3_1    conda-forge
libopenblas               0.3.9                h5ec1e0e_0    conda-forge
libpng                    1.6.37               hed695b0_1    conda-forge
libssh2                   1.8.2                h22169c7_2    conda-forge
libstdcxx-ng              9.2.0                hdf63c60_2    conda-forge
libuuid                   2.32.1            h14c3975_1000    conda-forge
libxcb                    1.13              h14c3975_1002    conda-forge
libxkbcommon              0.10.0               he1b5a44_0    conda-forge
libxml2                   2.9.10               hee79883_0    conda-forge
llvm-openmp               10.0.0               hc9558a2_0    conda-forge
matplotlib                2.2.5                         1    conda-forge
matplotlib-base           2.2.5            py27h250f245_1    conda-forge
ncurses                   6.1               hf484d3e_1002    conda-forge
nspr                      4.25                 he1b5a44_0    conda-forge
nss                       3.47                 he751ad9_0    conda-forge
numpy                     1.16.5           py27h95a1406_0    conda-forge
openssl                   1.1.1g               h516909a_0    conda-forge
pandas                    0.24.2           py27hb3f55d8_0    conda-forge
patsy                     0.5.1                      py_0    conda-forge
pcre                      8.44                 he1b5a44_0    conda-forge
pip                       20.0.2                     py_2    conda-forge
pthread-stubs             0.4               h14c3975_1001    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyqt                      5.12.3           py27hcca6a23_1    conda-forge
pyqt5-sip                 4.19.18                  pypi_0    pypi
pyqtwebengine             5.12.1                   pypi_0    pypi
pysam                     0.15.3           py27hda2845c_1    bioconda
python                    2.7.15          h5a48372_1011_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                2.7                    1_cp27mu    conda-forge
pytz                      2019.3                     py_0    conda-forge
pyvcf                     0.6.8                    py27_0    bioconda
qt                        5.12.5               hd8c4c69_1    conda-forge
readline                  8.0                  hf8c457e_0    conda-forge
scipy                     1.2.1            py27h921218d_2    conda-forge
seaborn                   0.9.0                      py_2    conda-forge
setuptools                44.0.0                   py27_0    conda-forge
singledispatch            3.4.0.3               py27_1000    conda-forge
six                       1.14.0                     py_1    conda-forge
sqlite                    3.30.1               hcee41ef_0    conda-forge
statsmodels               0.10.2           py27hc1659b7_0    conda-forge
subprocess32              3.5.4            py27h516909a_0    conda-forge
tk                        8.6.10               hed695b0_0    conda-forge
tornado                   5.1.1           py27h14c3975_1000    conda-forge
vcftools                  0.1.16               he860b03_3    bioconda
wheel                     0.34.2                     py_1    conda-forge
xorg-libxau               1.0.9                h14c3975_0    conda-forge
xorg-libxdmcp             1.1.3                h516909a_0    conda-forge
xz                        5.2.5                h516909a_0    conda-forge
zlib                      1.2.11            h516909a_1006    conda-forge

My test data contain 100 samples and 100 markers in the following format:

head -5 hap1.txt 
0	1	1	1	1	0	1	1	0	1	0	1	0	1	00
1	1	0	0	1	1	1	0	1	0	0	0	0	1	00
1	1	1	1	1	1	0	1	1	0	1	1	0	1	11
1	1	0	1	0	0	0	1	1	0	1	1	0	1	11
1	0	0	1	1	1	0	0	1	1	1	0
head -5 exp1.txt 
1	4	3	5	5	1	0	0	1	2	3	2	3	5	20
1	0	4	4	0	4	5	5	1	0	0	0	3	0	04
4	0	4	2	4	2	1	5	5	4	3	3	2	3	25
3	3	1	2	5	3	2	5	5	3	3	5	1	5	44
2	0	4	4	1	5	1	4	0	0	0	3

Many Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.