pontussk / popstats Goto Github PK
View Code? Open in Web Editor NEWPopulation genetic summary statistics
Population genetic summary statistics
Hey,
I am have converted my vcf file using the vcf2tped.py script. After that, I want to try to estimate FAB. But when I try to run it using the command:
python popstats.py --file input.tped --pops INDV1,INDV2,INDV3,INDV4 --ancestor ANC_INDV --FAB
I get the error:
Traceback (most recent call last):
File "/home/uramakri/ryanr/softwares/popstats/popstats.py", line 710, in
chromosome = int(col[0].lstrip('chr'))
ValueError: invalid literal for int() with base 10: 'CM000001.4'
Could you please let me know how to resolve this? the reference genome is dog not humans.
Anubhab
Hey Pontus!
I tried running popstats with the --f3vanilla option and it seems to require four populations to be specified, even though the --f3 correctly asks for 3.
$ python2.7 popstats.py --f3vanilla --pops 1,2,3 --file F3_popstats.merged
Traceback (most recent call last):
File "/Users/lamnidis/Software/popstats/popstats.py", line 185, in <module>
poplabel4=poplist[3].split('+')
IndexError: list index out of range
$ python2.7 popstats.py --f3vanilla --pops 1,2,3,3 --file F3_popstats.merged
#Runs without raising an error.
while --f3 runs normally with 3 populations:
$python2.7 popstats.py --f3 --pops 1,2,3 --file F3_popstats.merged
#Runs without raising an error
Thank you!
Thiseas
Hi Ponstuck,
I have a tped file with all autosomes and tfam file with the individuals (286) from 7 different populations. Each of the pops have a specific identifier the consists of a 3 letter pop code and individual identifier e.g KPA_562. I was inquiring how I would use the python script to extract the pops I want to study? The example file in the link you had provided in one of the issues I read is not active.
Thank you.
Hi,
I recently started using your code to estimate f2, since I couldnt seem to get AdmixTools to compile correctly. I have got most of the tests listed in the README to work correctly, although --pi doesnt seem to be recognized as an option. I also cant find it listed in the python file as an option. Any plans to update this?
Also I couldnt find the vc2tped.py mentioned in the README so just used VCFTOOLS --plink-tped option.
Any chance you might implement something where --pop option could accept a "file" with a list of which individuals belong to which populations? Currently, it is a bit cumbersome to list all individuals that belong to a population on the command line (e.g., ind1+ind2+ind3,ind4+ind5+ind6) and for calculating f2 needs to be rearranged since it only looks at pops in 1 and 2 position. What about a pairwise calculation for anything involving only 2 pops?
thank you,
Hi, I am not clear about the input files. For example, I have Pop1, Pop2, Pop3 and Outgroup. I want to calculate D-statistics (Pop1 , Pop2 ; Pop3 , Outgroup). Do i need to put these four populations in one tped file? or in 4 tped files ? Thanks.
Hi again Pontusk,
I was wondering if I use an input that has not not been filtered for Bi-allelic SNPs only will popstats just automatically ignore those SNPs with more than two alleles or will I have to provide an input that is already filtered for bi-allelic SNPs only?
Best,
Alex
Hello,
I am using DArTseq data in my analysis. I have missing values and this data type is binary. I was able to upload my data file to Plink and then I transformed them to use in popstats. However, I get errors when I tried to upload it to popstats. Do you think this is due to my data type and missing values?
Thank you :)
Best regards
Buddhika
Hi Pontus
I am trying to run the D-statistics. I followed your suggestion to put the population name in the first column. Then I feed popstats.py with my .tped and .tfam file then it shows
########################################
Traceback (most recent call last):
File "popstats.py", line 2105, in
Dp_main = sum(t_list) / sum(n_list)
ZeroDivisionError: integer division or modulo by zero
########################################
Command I used was
python2 popstats.py -p good.tped -f good.tfam --not23
-b 1000000 --pops pop1,pop2,pop3,pop4 --informative
What kind of error in my data lead to this?
Thank you!
Hi,
I get a sample size error when I try to execute:
python popstats.py --file input --pops POP_CI,POP_SI,POP_AR,POP_TH --ancestor POP_ANC --FAB
I have only one ancestral individual. Does the program need more than one outgroup? do all outgroup individuals have to be from the same speceis?
Thanks
Anubhab
Dear Pontus,
I was trying to understant the parameters for the h4 test (LD option) and I have some doubts
the --LD option is the classic D of LD but what about the LD window? which are the unit for this option? I thought that were cM but the default is 5000 (in the code).
Thank you for your time
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.