Giter Site home page Giter Site logo

popgen.jl's Introduction

logo

Population Genetics in Julia.

alt text Cite build status

How to install:

Invoke the package manager by pressing ] on an empty line and add PopGen

install_instructions


Cite As

Pavel V. Dimens, & Jason Selwyn. (2022). BioJulia/PopGen.jl: v0.8.0 (v0.8.0). Zenodo. https://doi.org/10.5281/zenodo.6450254

Authors

alt text alt text Pavel Dimens

alt text alt text Jason Selwyn

popgen.jl's People

Contributors

github-actions[bot] avatar jdselwyn avatar pdimens avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

popgen.jl's Issues

Request: Explain populations!(::Vector) better

It's quite unclear what it actually does. Currently, it says:

Vector of new unique population names in the order that they appear in the PopData.meta

, which all of my students took to mean that the input should be a vector of the same length as the dataframe, such that the n'th entry in the vector became the name of the n'th sample in the dataframe.

Two suggestions for making it better:

  • Do not rename missing values implicitly
  • Be more explicit about precisely what it does

Thanks for otherwise great docs!

add recode flag to V/BCF importer

Following the design of PGDSpider2 output, the V/BCF importer should have an optional flag recode::Bool = false (name pending) to rename all loci a simple generic name like SNP_1...SNP_n.

[bug] PopData.meta.name incorrectly typed from vcf import

description
The type of PopData.meta.name should be Vector{String}, but it is incorrectly interpreted as PooledArray when importing vcf data.

minimal example to reproduce

using GeneticVariation

x = vcf("some_file.vcf);

x.meta.name |> typeof
PooledArrays.PooledVector{String, UInt32, Vector{UInt32}}

expected behavior

x.meta.name |> typeof
Vector{String}

screenshots (optional)

additional info

[feature] consolidate file import info text

Is your feature request related to a problem and which?
Consolidate the printing of file information a bit. The idea is to have fewer lines and be more succinct overall.

Describe the solution/feature you'd like (with examples)

julia> @info "\n path_to_filename.gen\n formatting: delimiter = tab , loci = horizontal\n data: samples = xxx, populations = yy, loci = zzzz"
┌ Info: 
│  path_to_filename.gen
│  formatting: delimiter = tab , loci = horizontal
└  data: samples = xxx, populations = yy, loci = zzzz

** screenshot **
Proposed:
image

Current:
image

[feature] add NaturalSort.jl as dep

Is your feature request related to a problem and which?
Not a problem per se, but it would make sense to sort the loci dataframe using NaturalSort.jl, so this doesn't happen:

julia> x.loci
3488310×4 DataFrame
     Row │ name      population  locus     genotype 
         │ String    String      String    Tuple…?  
─────────┼──────────────────────────────────────────
       1 │ ATL_1988  missing     snp_1     missing  
       2 │ ATL_1988  missing     snp_10    (4, 4)
       3 │ ATL_1988  missing     snp_100   (3, 3)
       4 │ ATL_1988  missing     snp_1000  (3, 3)
       5 │ ATL_1988  missing     snp_1001  (4, 4)
       6 │ ATL_1988  missing     snp_1002  (1, 4)
       7 │ ATL_1988  missing     snp_1003  (4, 4)
       8 │ ATL_1988  missing     snp_1004  (4, 4)
       9 │ ATL_1988  missing     snp_1005  (4, 4)
      10 │ ATL_1988  missing     snp_1006  (3, 3)
      11 │ ATL_1988  missing     snp_1007  (3, 3)

Describe the solution/feature you'd like (with examples)
Add NaturalSort.jl as a dependency, configure the read_xxx functions to use sort(__, [:name, :locus], lt = natural) for the loci dataframe before returning the PopData object. It would also make writing to files consistent with how the snps are likley arranged in the source data (and congruent with output from e.g. PDGSpider2)

julia> tst = sort(x.loci, [:name, :locus], lt = natural)
3488310×4 DataFrame
     Row │ name      population  locus     genotype 
         │ String    String      String    Tuple?  
─────────┼──────────────────────────────────────────
       1 │ ATL_1988  missing     snp_1     missing  
       2 │ ATL_1988  missing     snp_2     missing  
       3 │ ATL_1988  missing     snp_3     missing  
       4 │ ATL_1988  missing     snp_4     (4, 4)
       5 │ ATL_1988  missing     snp_5     (4, 4)
       6 │ ATL_1988  missing     snp_6     (3, 3)
       7 │ ATL_1988  missing     snp_7     (2, 3)
       8 │ ATL_1988  missing     snp_8     (4, 4)
       9 │ ATL_1988  missing     snp_9     (3, 3)
      10 │ ATL_1988  missing     snp_10    (4, 4)
      11 │ ATL_1988  missing     snp_11    (1, 1)
      12 │ ATL_1988  missing     snp_12    (3, 3)
      13 │ ATL_1988  missing     snp_13    (4, 4)
      14 │ ATL_1988  missing     snp_14    (4, 4)
      15 │ ATL_1988  missing     snp_15    (1, 1)
      16 │ ATL_1988  missing     snp_16    (3, 3)
      17 │ ATL_1988  missing     snp_17    (2, 3)

permutations for fst shuffle indices and return views

rather than have _permute_FST take a matrix and 2 sizes, shuffle the row indices of the vcat'd merged matrix and index it twice (pop1, pop2) with views

back of envelope example

new_idx = shuffle(1:size(merged)[1])
pop1 = @views merged[newidx[1:npop1],:]
pop2 = @views merged[newidx[npop1+1:end],:]

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

[feature] Split file IO into separate package

Is your feature request related to a problem and which?
Not a problem, but it would be easier to treat file IO as a separate package that is required by and re-exported by PopGen.jl

Benefits

  • Simple maintenance because it will only require basic PopData functions
  • PopGen.jl codebase will be smaller
  • IO development can be independent from other package components
  • Contributions will be independent from main PopGen.jl codebase because it will only feature IO-specific things
  • cool new logo
  • precompile read functions with test data

are there alternatives?
Keep the package monolithic as it is now

additional info

[feature] remove `release` branch

As I'm learning more about GitHub, CI, and the Julia TagBot and Registrator, I'm learning that the release branch is redundant and makes the entire workflow cumbersome. Will be deleted with 0.7.0 release, which will address #82

[feature] Compatibility with DataFrames v1

Hi there,
Great to see this project! Thanks for implementing this!

Is your feature request related to a problem and which?
At the moment, PopGen does not work with (fast and snazzy) DataFrames v1:

(popgen) pkg> add PopGen DataFrames@1
    Updating registry at `~/.julia/registries/General`
    Updating git-repo `https://github.com/JuliaRegistries/General.git`
   Resolving package versions...
ERROR: Unsatisfiable requirements detected for package DataFrames [a93c6f00]:
 DataFrames [a93c6f00] log:
 ├─possible versions are: 0.11.7-1.1.0 or uninstalled
 ├─restricted to versions 1 by an explicit requirement, leaving only versions 1.0.0-1.1.0
 └─restricted by compatibility requirements with PopGen [af524d12] to versions: 0.11.7-0.22.7 — no versions left
   └─PopGen [af524d12] log:
     ├─possible versions are: 0.0.3-0.6.3 or uninstalled
     └─restricted to versions * by an explicit requirement, leaving only versions 0.0.3-0.6.3

Describe the solution/feature you'd like (with examples)
Would it be possible (within reasonable effort) to make them work together?

Many thanks!
Hannes

inconsistent VCF importing

Testing with some data, the VCF importer is not working 100% correctly. Some individuals are imported with 100% missing genotypes. This needs to be investigated to make it at least consistent with the data produced from VCF => Genepop conversion using PDGSpider2.

[feature] standardize function names

Is your feature request related to a problem and which?

  1. internal functions that will never be used by users should start with _
  2. user-facing functions should not have underscores separating words.
  • e.g. missing_data() => missingdata()

consolidate file io APIs to use multiple dispatch

Rather than having genepop and popdata2genepop, consolidate each io function to have an input and output method, i.e.:

# file reading
function genepop(infile::String; kwargs...)
...
end

# file writing
function genepop(data::PopData; kwargs...)
...
end

[bug] isbiallelic(::PopData) returns incorrect answer

description
The function isbiallelic(::PopData) returns false even if all isbiallelic(::GenoArray) for the PopData are true

minimal example to reproduce

x = vcf("some_file.vcf", rename_loci = true)
PopData Object
  Markers: SNP
  Ploidy: 2
  Samples: 441
  Loci: 7910
  Populations: 1
  Coordinates: absent

julia> isbiallelic(x)
false

julia> tmp = DataFrames.combine(
    groupby(x.loci, :locus),
    :genotype => isbiallelic => :bial
) ;

julia> all(tmp.bial)
true

expected behavior

julia> isbiallelic(x)
true

julia> tmp = DataFrames.combine(
    groupby(x.loci, :locus),
    :genotype => isbiallelic => :bial
) ;

julia> all(tmp.bial)
true

[bug] export keep and keep!

export add_meta!, locations, locations!, loci, genotypes, get_genotypes, get_genotype, populations, population, populations!, population!, exclude, remove, omit, exclude!, remove!, omit!, samples

keep and keep! need to be exported

[feature] locus-by-locus pairwise FST

Is your feature request related to a problem and which?
pairwise FST only returns an average across loci, but not the values for each locus

Describe the solution/feature you'd like (with examples)

pairwise_fst(::PopData; method::String, by::String = "locus" | "global" (default), iterations::Int)

[feature] Merge all PopGen_.jl packages under PopGen.jl monorepo

The goal is to make PopGen.jl a monorepo like Makie

Benefits:

  1. One repository, obviously. Currently, there is PopGen, PopGenCore and PopGenSims, the last of which lives as a repo under my personal account.
  2. It might make CI easier, since everything depends on PopGenSims, and upstream <-> downstream testing would be super helpful.

[feature] PCA and DAPC

Is your feature request related to a problem and which?
n/a

Describe the solution/feature you'd like (with examples)

  • PCA
  • DAPC a la adegenet
  • rLDA (regularized LDA)
  • cross validation on DAPC

[feature] speed up fst permutations

Is your feature request related to a problem and which?
Not a problem, but it might be cheaper to just shuffle all the indices and partition them into two vectors of size [np1, p2]
and return the indices. The indices will then be used in the main loop of the fst to index the matrices

Describe the solution/feature you'd like (with examples)

are there alternatives?
keep it as it is

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.