Giter Site home page Giter Site logo

sa-lee / starmie Goto Github PK

View Code? Open in Web Editor NEW
12.0 5.0 6.0 6.46 MB

starmie: plotting and inference for population structure models :star2:

License: Other

R 100.00%
population-genomics population-genetics population-structure r data-visualization

starmie's Issues

Bug: clummp greedyLargeK broken

on line 43 of clummp.R index is wrong.

clummp.R throws an error on greedy when
Error in Q_1 - Q_2 : non-conformable arrays

Cannot load structure output

Hi We have just generate a few structure outputs using command line structure:

structure -K 5 -D 9 -i o pop_K5_D9
The output is found as pop_K5_D9.out_f

However, when I try to load it to starmie:

library(starmie)
K5<-loadStructure("pop_K5_D9.out_f")

I get the following error

Error in dimnames(x) <- dn
     length of 'dimnames' [2] not equal to array extent

Is there anything wrong with the parsing? I have not changed anything inside the output of structure. I am using structure v2.3.4 with standard parameters from SNP data (VCF converted to structure).

I look forward to hearing from you.

Kind regards

simulate Q matrix with repeated runs from dirichlet distribution

see http://www.cs.cmu.edu/~epxing/Class/10701-08s/recitation/dirichlet.pdf and
https://en.wikipedia.org/wiki/Dirichlet_distribution

k <- 20
r <- 20
n_samples <- 100
# generate Q matrix by sampling from diricihlet
# keep alpha hyper param constant (same mean and variance for each K)
Q_1 <- MCMCpack::rdirichlet(n_samples, rep(1, k))
# permute labels by shuffling and adding gaussian noise noise over number of runs
permute_and_add_noise <- function(Q_mat, r) {
  logit <- function(x) { log(x / (1-x)) }
  perturb <- function(Q_mat) { 
    # add gaussian noise to each row of matrix
    q_random <- apply(Q_mat, 1, 
                      function(y) 1 / (1 + exp(-rnorm(length(y), 
                                                      mean = logit(y), 
                                                      sd = 0.01))))
    # normalise rows 
    q_normalised <- q_random / rowSums(q_random)
    # shuffle labels 
    q_normalised[, sample(seq_len(ncol(q_normalised)))]
  }

  replicate(r - 1, perturb(Q_mat))
}

permute_and_add_noise(Q_1, r)

Will have to fix labelling at some point but it is a start.

local shiny interface

i.e. something like

makeReport(struct_objects)

will produce an interface with the tests, MCMC plots etc.

cleaner example datasets

i.e. prestored as struct lists in a function call, makes running examples look cleaner.
i.e.

# Run example
microsat <- starmieGet("structure_mulitple_runs")

diagnostic plots

For ADMIXTURE:

  • plot CV error against K
  • plot final loglikelihood against K

For STRUCTURE:

  • plot MCMC reps against loglikelihood/alpha values over each K values

Vignette preparation

  • how to run structure so it actually works with our package
  • how to run admixture so it actually works with our package
  • about the starmie object
  • autoplot and fortify methods

Load Admixture Data

A function to load multiple Admixture output files into a useful data frame.

Input:
Multiple Admixture .P and .Q files
The initial input file fed to Admixture

Output:
A sensible data frame that make for easy downstream analysis

simulate from STRUCTURE model

simulation methods for testing evanno etc.
sample from posterior distrubition with known K, and admixture coeffecients

Estimate the best K

A function that runs the method of Evanno and plots ln(Pr(X|K) values in order to identify the k for which Pr(K=k) is highest (as described in STRUCTURE's manual, section 5.1)

Loading multiple files at once

Is it possible to load multiple STRUCTURE output files for the CLUMPPING together at once? At the moment it looks like that you have to load each output file separately
k6_msat <- loadStructure(k6_file),.... and combine them with k6_all <- structList(k6_msat, k6_run2) which can then be used to Q_list <- lapply(k6_all, getQ). It would be much easier if it was possible to load for example all 10 repeat files of a certain K at once with something like allfiles <- choose.files() and then Q_list <- lapply(allfiles, getQ). Or is that possible already?

test datasets

if you have any good example datasets we could use from the output of each of these programs let me know.

Bug: loadStructure deletes sample/pop labels if they are non-numeric

Incriminating code is here for ancestral_df

   ances_lines <- s_f[(which(grepl("^Inferred ancestry of.*", 
        s_f)) + 1):(which(grepl("^Estimated Allele Frequencies .*", 
        s_f)) - 1)]
    ances_lines <- str_trim(ances_lines)
    ances_lines <- gsub("[(:)]", "", ances_lines)
    ances_lines <- str_split_fixed(ances_lines, "\\s+", n = 4 + 
        pops)
    header <- ances_lines[1, ][1:3]
    ances_lines <- ances_lines[2:nrow(ances_lines), 2:ncol(ances_lines)]
# if Label (column 1 contains a string value the next line will produce NAs
    class(ances_lines) <- "numeric"
    ancest_df <- data.frame(ances_lines)
    colnames(ancest_df)[1:3] <- header
    colnames(ancest_df)[4:ncol(ancest_df)] <- paste("Cluster", 
        seq(1, ncol(ancest_df) - 3))

starmieList object?

Considering most of the utility of our package comes from combining multiple lists of struct objects - is it worth creating an S3 object that stores multiple Ks/runs?

Load Structure Data

A function to load multiple STRUCTURE output files.

Perhaps we should split this into multiple functions that iteratively add the output of the post processing programs (clump, harvester etc).

Inputs:
Output of STRUCTURE, clump, clumpak? and harvester.

Output:
A useful data frame for further downstream analysis.

loadStructure introducing NAs

Hi, when I go to add the data generated (from runStructure) via loadStructure, it keeps introducing NA's into the last column of the file! Therefore when I generate clumpp files, the final cluster is NA and I can't generate plots etc. !

k3_1 <- loadStructure("/R_STRUCTURE/run01_K_03.out_f")
k3_2 <- loadStructure("/R_STRUCTURE/run02_K_03.out_f")
k3_3 <- loadStructure("/R_STRUCTURE/run03_K_03.out_f")

NAs introduced by coercion

k3_struct <- structList(k3_1, k3_2, k3_3) #works
Q_k3 <- lapply(k3_struct, getQ) #works but then introduces NA's in last column e.g.:
Screen Shot 2019-09-29 at 11 18 30 am

Can I please have some help!

reduce package size

  • run admixture on smaller dataset
  • delete logs unless they are used by an example function

Generate barplot

Take the output of the load admixture and load structure function and generate a stacked barplot split by K.

Input:
The output of load structure or load admixture

Output:
A ggplot2 object

clumpak

implement clumpak algorithm

populations parameter should be a character vector

The current populations parameter requires a data.frame with isolate then population in two seperate columns. This is clunky to use in practice as we usually just want to give it a vector. If you agree with this change @sa-lee I can implement it in the DEV although it will mean we have to update the docs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.