Giter Site home page Giter Site logo

trvinh / admix Goto Github PK

View Code? Open in Web Editor NEW

This project forked from stevenliuyi/admix

0.0 2.0 0.0 159.26 MB

an admixture analysis tool for Python that supports raw data from 23andme, AncestryDNA, etc.

License: GNU General Public License v3.0

Python 98.46% Shell 1.54%

admix's Introduction

Admix

Build Status PyPI version

Admix is a simple tool to calculate ancestry composition (admixture proportions) from SNP raw data provided by various DNA testing vendors (such as 23andme and AncestryDNA).

Installation

Install from Github

You can use pip to install Admix directly from this Github repository:

pip install git+https://github.com/stevenliuyi/admix

Install from PyPI

You can also install Admix from PyPI:

pip install admix

Note that due to the size limit, the package on PyPI only contains five models (K7b, K12b, globe13, world9 and E11). If you want all models, you could download them or just install Admix from this repository as shown above.

Usage

Suppose that you've already had your 23andme raw data downloaded and placed in the current directory with the name my_raw_data.txt. Then you can perform admixture calculation by specifying the calculation model (K7b in this example):

admix -f my_raw_data.txt -v 23andme -m K7b

You can also set multiple models for calculation:

admix -f my_raw_data.txt -v 23andme -m K7b K12b

If no models are set, the program will apply all the available models:

admix -f my_raw_data.txt -v 23andme

You can choose the raw data format by changing the -v or --vendor parameter. The values supported are listed here.

You may also set the -o or --output parameter to write the ancestry composition results into a file:

admix -f my_raw_data.txt -v 23andme -o result.txt

If you don't have your raw data yet, you can also test the program by using a demo 23andme data file provided by the program:

admix -m world9

Chinese users may turn on the -z flag so the population would be displayed in Chinese:

admix -z -m E11

Besides, you may use --sort flag to sort the proportions and --ignore-zeros flag to display non-zero proportions only.

For more help information, you could use:

admix -h

Output Example

  • English

Command: admix -m K12b

Output:

Gedrosia: 0.06%
Siberian: 3.71%
Northwest African: 0.00%
Southeast Asian: 33.43%
Atlantic Med: 0.07%
North European: 0.00%
South Asian: 0.00%
East African: 0.00%
Southwest Asian: 0.01%
East Asian: 62.72%
Caucasus: 0.00%
Sub Saharan: 0.00%
  • Chinese

Command: admix -m K12b -z

Output:

格德罗西亚: 0.06%
西伯利亚: 3.71%
西北非: 0.00%
东南亚: 33.43%
大西洋地中海: 0.07%
北欧: 0.00%
南亚: 0.00%
东非: 0.00%
西南亚: 0.01%
东亚: 62.72%
高加索: 0.00%
撒哈拉以南非洲: 0.00%

Raw Data Format

Admix supports raw data formats from the following DNA testing vendors with -v or --vendor parameter:

parameter value vendor
23andme 23andme
ancestry AncestryDNA
ftdna FamilyTreeDNA Family Finder
ftdna2 FamilyTreeDNA Family Finder (new format)
wegene WeGene
myheritage MyHeritageDNA

Models

Admix supports many publicly available admixture models. All the calculator files are properties of their authors, and are not covered by the license of this program. Links are provided which contain more information for each model.

model value model name source
K7b Dodecad K7b Link
K12b Dodecad K12b Link
globe13 Dodecad globe13 Link
goble10 Dodecad globe10 Link
world9 Dodecad world9 Link
Eurasia7 Dodecad Eurasia7 Link
Africa9 Dodecad Africa9 Link
weac2 Dodecad weac (West Eurasian cline) 2 Link
E11 E11 Link
K36 Eurogenes K36 Link
EUtest13 Eurogenes EUtest K13 Link
Jtest14 Eurogenes Jtest K14 Link
HarappaWorld HarappaWorld Link
TurkicK11 Turkic K11 Link
KurdishK10 Kurdish K10 Link
AncientNearEast13 Ancient Near East K13 Link
K7AMI Eurogenes K7 AMI Link
K8AMI Eurogenes K8 AMI Link
MDLPK27 MDLP K27 Link
puntDNAL puntDNAL K12 Ancient World Link
K47 LM Genetics K47 Link
K7M1 Tolan K7M1 Link
K13M2 Tolan K13M2 Link
K14M1 Tolan K14M1 Link
K18M4 Tolan K18M4 Link
K25R1 Tolan K25R1 Link
MichalK25 Michal World K25 Link

Implementation

Maximum likelihood estimation (MLE) algorithm is applied for ancestry composition calculation, and the implementation is fairly straightforward.

Let Fnk be the minor allele frequency of SNP marker n for population k, lminorn and lmajorn be the minor and major allele for marker n respectively, and Gni be the allele at marker n of the individual we're interested in (i=1,2). Our goal is to find the admixture fraction qk of the individual, which maximize the log likelihood function

where χ is the indicator function, J and j are the all-ones matrix/vector. Note that the Einstein summation convention is implied here. With the constraints 0 ≤ qk ≤ 1 and Σ qk = 1, we can obtain the admixture proportions qk by applying optimization techniques.

admix's People

Contributors

stevenliuyi avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.