Giter Site home page Giter Site logo

albi's Introduction

FIESTA (Fast confidence IntErvals using STochastic Approximation)

FIESTA is a method for the construction of accurate confidence intervals (CIs) for heritability. FIESTA can be used as an add-on to existing methods for heritability and variance components estimation. FIESTA is described in the following paper.

Please cite this paper as: Schweiger, R., Fisher, E., Rahmani, E., Shenhav, L., Rosset, S., & Halperin, E. (2018). Using stochastic approximation techniques to efficiently construct confidence intervals for heritability. Journal of Computational Biology, 25(7), 794-808.

The Supplemental Information for the FIESTA RECOMB 2017 paper is available here.

If you want to actually estimate the distribution of the heritability estimator, or need CIs for a lot of estimates, check out FIESTA's predecessor, ALBI, available under the same package here.

FIESTA's input is the eigenvalues of a kinship matrix (possibly, also its eigenvectors, and general covariates), and it produces accurate confidence intervals for a set of heritability estimates.

Code written by Regev Schweiger, Eyal Fisher and Reut Yedidim. For questions or comments, mail [email protected].

Simple Example

The following reads the eigenvalues from a file, calculates 95% CIs over a grid of heritability estimates, and outputs it:

   python fiesta.py --kinship_eigenvalues ./data/eigenvalues.txt

Note: Under Windows, replace fiesta.py with albi_lib/fiesta.py.

The default output is the CIs for heritability estimates of 0, 0.1, ..., 0.9, 1:

Estimate   CI_lower_bound  CI_upper_bound
0.00000	0.00000	0.69230
0.10000	0.00000	0.81158
0.20000	0.00000	0.90069
0.30000	0.00000	1.00000
0.40000	0.00000	1.00000
0.50000	0.03170	1.00000
0.60000	0.07461	1.00000
0.70000	0.17200	1.00000
0.80000	0.21739	1.00000
0.90000	0.33654	1.00000
1.00000	0.42484	1.00000

For example, the CI for an estimate of 0.1 is [0, 0.81158].

If you use covariates or you generated your matrix in a non-standard way, you would need additional flags. For more information and options, see below.

Installation

There are several ways to install and use FIESTA.

1. Download ZIP

You can simply download the code and run it as a standard program:

  1. Download the ZIP file of the latest release here: https://github.com/cozygene/albi/releases.
  2. Extract the ZIP file to a folder
  3. Run FIESTA from the folder:
   python fiesta.py [flags]

Note: The NumPy and SciPy packages are required for FIESTA.

2. Install using pip or setuptools

If you have pip installed, run this on command line (from anywhere):

   sudo pip install git+https://github.com/cozygene/albi

Alternatively, you can also clone the repository and do a manual install:

   git clone https://github.com/cozygene/albi
   sudo python setup.py install

Note: If you are using anaconda, you could replace sudo pip and sudo python with the path to the binaries in the bin directory in your anaconda installation.

Uninstalling

To uninstall, run:

    sudo -H pip uninstall albi

Running FIESTA

The following section describes all of FIESTA's options and flags.

There are two use cases, and it is important to distinguish between them: The case where all covariates are eigenvectors of the kinship matrix (this often includes the intercept!); and the case of general covariates. The first case allows for simpler and faster computation, while the second is more general.

1. All covariates are eigenvectors

Where all covariates are eigenvectors of the kinship matrix, there is a simple closed-form formula that produces faster computations. Another nice side effect is that, in this case, the eigenvectors themselves are not needed; only the eigenvalues, and the identity of the eigenvectors that are used as covariates, is needed. In particular, this includes the following special cases:

  1. Only an intercept term (a vector of ones). This is the default in e.g., GCTA. Since the genotype matrix is mean-centered, it follows that the vector of ones is an eigenvector, corresponding to the last (smallest) eigenvector of the kinship matrix (see Supplemental Note S4.5 in the ALBI paper). If you built your kinship matrix with default settings, this is probably the case relevant to you.
  2. No fixed effects at all (including an intercept).
  3. In addition to an intercept term, a common practice is adding as fixed effects the q largest principal components, corresponding to the first q eigenvectors of the kinship matrix.

The following command builds CIs and saves them to a file:

   python fiesta.py --kinship_eigenvalues filename
                 [--use_eigenvectors_as_covariates <list of #>]
                 (--estimates_filename filename
                    or
                  --estimate_grid <# of grid points>)
                 [--confidence <required confidence level>] 
                 [--iterations <# of iterations>] 
                 [--output_filename filename]

Note: Under Windows, replace fiesta.py with albi_lib/fiesta.py.

The flags are as follows:

  • kinship_eigenvalues (Shortcut: -k) - A file containing the eigenvalues of the kinship matrix, one eigenvalue per line, in text format. This could be created, for example, with GCTA's --pca flag.
  • use_eigenvectors_as_covariates (-u) - A list detailing which eigenvectors should be used as covariates. For example: If you only use an intercept term, and your kinship matrix was constructed from a mean-centered SNP matrix, use -1 (the last eigenvector is equal to the constant vector); if in addition, you add the first 3 PCs as covariates, use 0,1,2,-1. If not specified, defaults to -1. If you explicitly want no covariates, use -u ''.
  • estimates_filename (-f) - A filename containing a list of heritability estimates (one per line) in text format. A CI will be calculated for each one.
  • estimate_grid (-g) - Alternatively, one can ask ALBI to calculate CIs for a grid of heritability estimates (e.g., a grid of 100, will calculate CIs for 0, 0.01, ..., 0.99, 1). Default is 10.
  • confidence (-c) - The required confidence level for the CIs. Default is 0.95 (95% CIs).
  • iterations (-n) - The number of iterations used in the algorithm (see paper). Decreasing this number will make the algorithm run faster but will result with less accurate results, and vice versa. Default is 1000.
  • output_filename (-o) - File to which to write the calculated CIs. If not supplied, CIs will be printed.

2. General covariates

When covariates are not eigenvectors of the kinship matrix, you must supply both the eigenvalues, the eigenvectors and the covariates explicitly. Computation will be slower (although still linear in the number of individuals):

The following command estimates the distributions, and saves the results to a file:

   python fiesta.py --kinship_eigenvalues filename
                  --kinship_eigenvectors filename
                 [--covariates filename]
                 [--no_intercept]
                 (--estimates_filename filename
                    or
                  --estimate_grid <# of grid points>)
                 [--confidence <required confidence level>] 
                 [--iterations <# of iterations>] 
                 [--output_filename filename]

Note: Under Windows, replace fiesta.py with albi_lib/fiesta.py.

The flags are as follows:

  • kinship_eigenvalues (-k) - A file containing the eigenvalues of the kinship matrix, one eigenvalue per line, in text format. This could be created, for example, with GCTA's --pca flag, after removing the IDs from the output file.
  • kinship_eigenvectors(-v) - A file containing the eigenvectors of the kinship matrix, one eigenvector per column, in text format. This could be created, for example, with GCTA's --pca flag, after removing the IDs from the output file.
  • covariates (-x) - A file containing the covariates, one covariate per column, in text format.
  • no_intercept (-i) - Add to explicitly not use a constant 1 (intercept) covariate. Can be used without an additional covariates file. Is not enabled by default, i.e. an intercept is added.
  • estimates_filename (-f) - A filename containing a list of heritability estimates (one per line) in text format. A CI will be calculated for each one.
  • estimate_grid (-g) - Alternatively, one can ask ALBI to calculate CIs for a grid of heritability estimates (e.g., a grid of 100, will calculate CIs for 0, 0.01, ..., 0.99, 1). Default is 10.
  • confidence (-c) - The required confidence level for the CIs. Default is 0.95 (95% CIs).
  • iterations (-n) - The number of iterations used in the algorithm (see paper). Decreasing this number will make the algorithm run faster but will result with less accurate results, and vice versa. Default is 1000.
  • output_filename (-o) - File to which to write the calculated CIs. If not supplied, CIs will be printed.

Using in SOLAR-Eclipse

SOLAR-Eclipse is a software package for genetic analysis. SOLAR deals mainly with explicit pedigree files, but it is possible to generate a kinship matrix from it for processing in FIESTA. Whenever you load a pedigree file, solar saves the kinship matrix it computes (see 8.3 here). After loading the pedigree and locating the matrix file, run the following Python code:

import numpy as np
import gzip
import numpy.linalg

filename = "<path_to_file>/phi2.gz"
lines = np.vstack([np.array(line.strip().split()[:3]).astype(float) for line in gzip.open(filename).read().splitlines()[1:]])
n_samples = int(max(lines[:,0]))
kinship_matrix = np.zeros((n_samples, n_samples))
for x,y,c in lines:
    kinship_matrix[int(x)-1, int(y)-1] = c

vals, vecs = numpy.linalg.eigh(kinship_matrix)
savetxt("<path>/eigenvalues.txt", vals[::-1])
savetxt("<path>/eigenvectors.txt", vecs[::-1])

The output txt can now be used in FIESTA.

๐Ÿน

albi's People

Contributors

regevs avatar yedidimr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

albi's Issues

Error: Line 132, Dangerous Eigenvectors

I am tying to use FIESTA in order to estimate confidence intervals--however, I am running into an error when using the more general case. It works fine if I stick to the simpler case and use only eigenvalues.

Paraphrasing but the error I basically get is: "Error in line 132, dangerous_eigenvectors = kinship_eigenvectors[:, is_dangerous]"

I understand it probably has something to do with the eigenvectors--but I obtained them directly from GCTA so I am not sure what I could do to fix them. I am using the first 20 principal components. Any help is appreciated, thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.