arundurvasula / angsd-wrapper Goto Github PK
View Code? Open in Web Editor NEWA project for extending ANGSD
License: MIT License
A project for extending ANGSD
License: MIT License
From Jeff:
"An awesome feature would be to have a radio button that throws on a default lowess or other smoothing function to plot trends in the current window."
"If these files are not present, the script will not work correctly"
The script should throw an error if there are no or incorrect files.
When all the samples analyzed have similar inbreeding coefficient, autogenerate a specific value of inbreeding coefficient repeated by the number of samples (bam files input)
angsd-wrapper can be run directly from the command line (as in the wiki), but it can also be submitted to a cluster queueing system. Write an example file to show how this is done:
#!/bin/bash
#SBATCH -D /home/adurvasu/angsd-wrapper
#SBATCH -J Slurm-Thetas
#SBATCH -o /home/adurvasu/angsd-wrapper/results/out-%j.txt
#SBATCH -e /home/adurvasu/angsd-wrapper/results/error-%j.txt
bash scripts/ANGSD_Thetas.sh scripts/thetas_example.conf
Also, this can be added automagically to the configuration script. I.e., set SLURM=true
and then use the info in the conf file to point slurm to the project directory ($PROJECT_DIR
). Can also add more cluster support later.
Current version of the code ignores thetas files if present. This is a good default, but I think there should be an option to override, perhaps in config file?
https://github.com/arundurvasula/angsd-wrapper/blob/master/scripts/common.conf#L15
It's different because it shouldn't exist! The only folder that's in scripts is shiny. There shouldn't be any other folders in there. The results and data folders should be in the top-level directory ($PROJECT_DIR
)
Some steps in ANGSD require the same initial analyses to be done. There should be a function that checks if these analyses are done already and skips them if it is.
`REGIONS="1:1-1000" is not supported by the script so it shouldn't be in the example.
Write makefile to manage installation of necessary software initially.
Will accomplish the following (and other steps as necessary):
make
angsdmake
ngstoolsThis will make installation much easier and faster.
Right now, the shiny script will only graph the thetas that are included in the folder.
script to run through ngsF from ngstools. TVK working on initial version. Should ideally output file of values for use in thetas etc. scripts.
Latest commit: 445b861 moved a lot of variable declarations to a common.conf file, but that file needs to be sourced in the script files in order to be used right? Running ANGSD_SFS with the default conf gives the following error:
scripts/ANGSD_SFS.sh: line 95: ANGSD_DIR: unbound variable
I think because it can't find that declaration. Adding source common.conf
before loading the user supplied configuration (here) should fix this.
I've updated some of the scripts to make them more generalizable. In particular, I've substituted environment variables for hard coded user names. I'm not finished, but you can see changes on my fork: https://github.com/MorrellLAB/angsd-wrapper
These values have defaults but can be overridden which means they should be included in the script instead of the conf files.
Script to automatically graph SFS and thetas along the chromosome.
Add ability to select tracks/features from a GFF file and display multiple GFFs.
Add option to plot SFS using the ANGSD file.saf.sfs. May be useful to have in the same place as the thetas. ANGSD has their plot code here: http://popgen.dk/angsd/index.php/SFS_Estimation
Add documentation as to which parameters have defaults and note that those can be changed in the config file (i.e. that user doesn't ever have to change the bash script)!
For debugging purposes.
All theta estimates displayed on shiny graphs should alawys be value divided by window size to get a per bp value.
Shiny graphs should probably use dots with alpha to make graphs more easily read for larger chromosomes pieces.
An awesome feature would be to have a radio button that throws on a default lowess or other smoothing function to plot trends in the current window.
In the ANGSD_SFS.sh
script, it should check if the init script needs to be run and run if it does need to be run. This can be done with:
if [ ! -d "$DIRECTORY" ]; then
# Control will enter here if $DIRECTORY doesn't exist.
fi
That should also be in a function in a utils.sh
script so that it can be easily run in other scripts.
Refactor it as soon as possible to avoid immense sadness later.
Add list of papers and which methods come from each to wiki.
http://popgen.dk/angsd/index.php/RealSFS
* The program was called emOptim2 in earlier versions, this has now been changed to the more appropiate: 'realSFS'
I have a regions file that looks like this:
9:52173261-52173990
5:12995975-12997206
9:113061855-113063368
9:15043238-15044776
2:14957924-14959456
After running ANGSD_Theta.sh I got a message: Problem with chr: 9, key already exists. 9:113061855-113063368 is not being written either. Is that because this region has less than 10 reads?
Could add a checkbox if user wants to plot a lowess fit to the data using lowess(). Could just use lowess defaults, which work pretty well, or allow user to change the f parameter of the function.
default to maize chromosome 10 annotation
Make GFFs polygons instead of rug. Use pentagon with pointy end point in direction for strand.
Theta statistics need to be divided by number of bp in a window. Windows with 0bp should not have points plotted.
Going to need to have a wrapper script that calls SNPs. I will make a mock script in angsd that you could wrapperize for me?
Gene annotation might be better shown using rug() or some other way to just plot genes along bottom rather than taking up the whole plot.
Init.sh does not create the proper directories when it's called from 2DSFS, SFS, or Thetas resulting in a loss of work because ANGSD can't save the results anywhere. Possibly need to tell users to run init.sh separately before starting scripts.
The original ANGSD directory should be added as a git submodule.
https://github.com/mfumagalli/ngsTools
First priority is ngsPopGen.
Later, can change to include entire ngsTools repository.
Remove redundant and old information from the README and point a link to the wiki. Add basic, overall information to README:
default value for uniqueonly option should be 0
Maybe instead of unix user, you ask them to give path to the project home dir. For example I don't want things in ~/data I want them in ~/projects/bigd/angsdbigd/ etc. Should make a new directory called "angsdwrapper" in whatever dir they give, and then generate subdirectories.
It's unclear from the documentation that variables like uniqueonly can be set in the config files to override the default values.
If you run the thetas script without first running the SFS script, it will crash. Should call SFS before thetas if SFS results don't already exist. Also need to make sure that the regions are the same between both files.
I have 10 highly inbred samples. I set the inbreeding coefficient to 0.99 for each individual in the respective file (x_F.txt ). However, the X_DerivedSFS file gives me 21 values when I am expecting a maximum of 10 or 11 (if adjusted to differences in sample size). Probably, the inbreeding coefficients are being ignored so the chromosome number is doubled.
Use pch=19 for points.
Comparison between several window sizes could be useful. Add a way to select between them (just use the file chooser?) or be able to display multiple at a time.
Is the considerable overlap we see on the rug rectangles due to a weird subsetting of the GFF file? Shouldn't be lots of gene annotations within a single 1-2kb window.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.