Giter Site home page Giter Site logo

Comments (35)

mkborregaard avatar mkborregaard commented on May 18, 2024 2

👍 for MultivariateStats and then dependence on that from ecology packages but that's of course up to Richard in this case.

from diversity.jl.

mkborregaard avatar mkborregaard commented on May 18, 2024 2

Great, I'll go and rewrite every single matrix operation in SpatialEcology to work on a transposed matrix then 😱

from diversity.jl.

richardreeve avatar richardreeve commented on May 18, 2024 1

I think this kind of thing is why we need to meet up of course - useful though this is!

from diversity.jl.

mkborregaard avatar mkborregaard commented on May 18, 2024 1

Hmm it looks like the column names are part of the type signature of axisarrays:

typeof(ab)
#AxisArrays.AxisArray{Int64,2,Array{Int64,2},Tuple{AxisArrays.Axis{:species,Array{Symbol,1}},AxisArrays.Axis{:sites,Array{Symbol,1}}}}

So it would be doable to specialise ComMatrix as a parametric typealias of an AxisArray, and then dispatch our functions on ComMatrix. We could do the same for SubComMatrix, and then define AbstractComMatrix (which all the dispatching occurs on) as a Union rather than as an abstract type. Something like

const ComMatrix = AxisArrays.AxisArray{T,2,Array{T,2},Tuple{AxisArrays.Axis{:Species,Array{<:Union{String, Symbol},1}},AxisArrays.Axis{:Sites,Array{<:Union{String, Symbol},1}}}} where T<:Union{Float64, Int, Bool}

const SubComMatrix =AxisArrays.AxisArray{T,2,SubArray{T,2,Array{Float64,2},I,true},Tuple{AxisArrays.Axis{:Species,Array{<:Union{String, Symbol},1}},AxisArrays.Axis{:Sites,Array{<:Union{String, Symbol},1}}}} where T<:Union{Float64, Int, Bool} where I

const AbstractComMatrix = Union{SubComMatrix4, ComMatrix}

That would still allow us to dispatch freely on ComMatrix (and do syntactic sugar as species(x::AbstractComMatrix, sp) = x[Axis{:Species}(sp)], while allowing ComMatrix full AxisArray capabilities.

from diversity.jl.

richardreeve avatar richardreeve commented on May 18, 2024 1

We're slowly getting there! @mkborregaard and I finally had a meeting last week to discuss how to merge things, and we've created a common interface and type hierarchy - currently in alpha at mkborregaard/EcoBase.jl - that we will hopefully both transition to shortly for Diversity and SpatialEcology. After that, the idea is to see how many actual concrete types we can put into there, and how many will need to stay separate - we're hoping that we can move as much as possible into EcoBase, and that other packages can then also use it as it develops.

from diversity.jl.

kescobo avatar kescobo commented on May 18, 2024 1

Lol, I love that all of the supertypes are just things - that nicely gets around the issue of what to call it :-P. Presumably all of those will just be overloaded with application-specific function and type aliases?

Seems great - please let me know if I can help! And @mkborregaard, the organization of that package at this point looks straightforward enough that, once you think it's in a usable state, I'd be happy to attempt to make the transition on Microbiome.jl and just have you review my changes rather than you doing the PR (if you think that would be easier... of course I don't mind if you just want to tackle it).

from diversity.jl.

richardreeve avatar richardreeve commented on May 18, 2024 1

This is now fully implemented in Diversity v0.4.5, and you can plot Diversity Metacommunity objects, and calculate diversity of SpatialEcology Assemblage objects!

from diversity.jl.

mkborregaard avatar mkborregaard commented on May 18, 2024

Nice. I don't know if it necessarily needs to be consolidated into one package, though I definitely think it makes sense to put things that are relevant to more than just microbiology with other general ecology functions. But what would be cool is to agree on a common set of types, so that the same data object can be used with all the different packages that do ecology.

For instance Diversity's MetaCommunity essentially tries to do the same thing as SpatialEcology's Assemblage, and I'd really like to consolidate them - I will visit Richard as soon as we find a time where both are available :-)
SpatialEcology tries to only be a package that defines basic types, constructors, and view-/indexing operations on them - the idea is that other ecology packages should be able to depend on it cheaply and define operations easily.

I think your AbundanceTable is essentially the same as SpatialEcology's ComMatrix{Int} type. If you'd be willing to consider using that type instead I'd be happy to make a PR with the change for your review. It would give a nice synergy I think.

from diversity.jl.

kescobo avatar kescobo commented on May 18, 2024

@mkborregaard I agree 100% with interoperability, and am happy to consider a PR. One wrinkle is that I need abundance tables to be able to take Floats as well - we often express these things as relative abundance (eg between 0 and 1).

from diversity.jl.

kescobo avatar kescobo commented on May 18, 2024

Nice. I don't know if it necessarily needs to be consolidated into one package, though I definitely think it makes sense to put things that are relevant to more than just microbiology with other general ecology functions.

I didn't mean to say that everything I'm developing for Microbiome.jl should be ported here, it's just that PCoA seems to make sense with Diversity.jl. Though another option is to put it in MultivariateStats.jl, since they already have PCA and LDA....

from diversity.jl.

mkborregaard avatar mkborregaard commented on May 18, 2024

Interesting, I never considered that. Should be easy to add. Would it make sense when implementing that to enforce 0 <= x <= 1?

from diversity.jl.

kescobo avatar kescobo commented on May 18, 2024

Would it make sense when implementing that to enforce 0 <= x <= 1

Not sure - I often import tables that are expressed as percent relative abundance (eg between 0.0 and 100.0), but I think that's kind of silly and it's trivial to transform that to make it a fraction instead on import (which is what I do anyway). I've already got a function that takes such a table and mutates it from counts or whatever to relative abundance as well.

👍 for MultivariateStats and then dependence on that from ecology packages

The more I think about it - the more I think this makes the most sense, presuming the folks at MultivariateStats are game. I'll leave this open pending discussion there and feedback from @richardreeve

from diversity.jl.

kescobo avatar kescobo commented on May 18, 2024

JuliaStats/MultivariateStats.jl#58

from diversity.jl.

richardreeve avatar richardreeve commented on May 18, 2024

Sorry for the delay - I'm on a ropy connection in Tanzania at the moment! I agree with you both that JuliaStats/MultivariateStats might be a better home for this as it's a much more general thing than just a diversity issue. I've been able to incorporate more basic things I needed into their packages in the past without any great difficulty by creating PRs in the style of their code... it can be harder to start a discussion without so code to refer to though.

from diversity.jl.

richardreeve avatar richardreeve commented on May 18, 2024

I think the more fundamental issue is the one that you've been discussing second though - how to get consistent structures for data that allow us to easily use datasets in different packages - @mkborregaard and I have been trying to meet to discuss this for a few months, without any obvious success! In particular, I would like to attach abundances and relative abundances (i.e. ints and floats) and presence/absence (bits/booleans) to a whole series of different objects, from the simplest - just a list of names representing, say, species to phylogenies (where one dimension of the [relative] abundance matrix corresponds to the tips), sequence data (where one dimension corresponds to the sequences), taxonomies, and a variety of other things. The other dimension(s) also need to be able to represent samples, sites or some other category, or some one or more dimensional spatial layout. Some of this is already covered by this package, some by SpatialEcology, the sequence stuff presumably by Microbiome. I am looking hard at AxisArrays as a potential solution to many of these problems, and have been playing around with it a lot recently, but things like ComMatrix and AbundanceTable do something similar, though I'm not necessarily sure what their advantages are at this point (I know ComMatrix uses sparse arrays, but AxisArray can too, for instance). In any event, when I get back next week, I would very much like to try a lot harder to sort this out...

from diversity.jl.

kescobo avatar kescobo commented on May 18, 2024

💯 on interoperability in general - I agree that a lot that we do with the microbiome is ultimately ust applications of ecology. Anything I can do to help or provide feedback, let me know!

from diversity.jl.

mkborregaard avatar mkborregaard commented on May 18, 2024

@richardreeve yes, AxisArrays may be what we need in some form. ComMatrix was first a typealias for NamedMatrix{<:Union{Int, Bool}}, then a thin wrapper around a NamedMatrix, to give more control. But in the end the method forwarding had some issues (e.g. NamedArrays don't deal well with views into sparse Matrices, which SpatialEcology uses all the time), and I figured it was easier to keep track of the margin names myself.
AxisArrays is probably better, so def worth looking into basing ComMatrix on that. There is no inherent advantage to using a ComMatrix rather than a basic AxisArray, except for dispatch - programming functions with sensible intuitive names that do intuitive things to the ComMatrix because we know what it is. Having a type allows e.g. enforcing species on columns and sites on rows (or vice versa), reducing the amount of mental fuel expended to keep track on such things and hopefully reducing bugs.

from diversity.jl.

mkborregaard avatar mkborregaard commented on May 18, 2024

Richard, we're debating the relative merits of

    species
 ------------
s|
i|
t|
e|
s|

vs

     sites
 ------------
s|
p|
e|
c|
i|
e|
s|

The first is consistent with R and the tidy data convention, the second may be faster if most operations work on a by-community scale (rather than by-species scale), given the column-major nature of julia. What are your thoughts on this?

from diversity.jl.

richardreeve avatar richardreeve commented on May 18, 2024

I definitely go with the latter, and do in R as well incidentally, because I often have data in dataframes, which as as column-major as they get. However, using AxisArrays would potentially allow us to not enforce this:

julia> ab = AxisArray([1 2; 3 4; 5 6], Axis{:species}([:sp1, :sp2, :sp3]), Axis{:sites}([:site1, :site2]))
2-dimensional AxisArray{Int64,2,...} with axes:
    :species, Symbol[:sp1, :sp2, :sp3]
    :sites, Symbol[:site1, :site2]
And data, a 3×2 Array{Int64,2}:
 1  2
 3  4
 5  6

julia> ab[Axis{:species}(:sp2)]
1-dimensional AxisArray{Int64,1,...} with axes:
    :sites, Symbol[:site1, :site2]
And data, a 2-element Array{Int64,1}:
 3
 4

julia> ab[Axis{:sites}(:site2)]
1-dimensional AxisArray{Int64,1,...} with axes:
    :species, Symbol[:sp1, :sp2, :sp3]
And data, a 3-element Array{Int64,1}:
 2
 4
 6

Although we'd have to agree on some kind of standard for the axis naming, or store that information somewhere. We could of course strongly recommend that people used one ordering, and indeed not even mention that the other works, but it would give us flexiibility.

from diversity.jl.

mkborregaard avatar mkborregaard commented on May 18, 2024

No, I'm willing to do that. But, though I do see where both of you are coming from, there are many operation we do on community matrices all the time which summarize over species, i.e. the other way. You could be right that it is generally less common, though.

from diversity.jl.

richardreeve avatar richardreeve commented on May 18, 2024

Yes, absolutely - that was the kind of thing I was thinking of. I was actually wondering whether you could do something like:

ab = AxisArray([1 2; 3 4; 5 6], Axis{Species}([:sp1, :sp2, :sp3]), Axis{Subcommunities}([:site1, :site2]))

where Species and Subcommunities are the types of thing in the communities <: AbstractTypes, and the community structure themselves <:AbstractPartition, and then do a parametric typealias on that. Unfortunately Axis{X} requires X to be a symbol, and doesn't seem to allow implicit conversion (with convert()) which was the only fix I could think of.

from diversity.jl.

mkborregaard avatar mkborregaard commented on May 18, 2024

I've got to admit I'm not completely sharp on your types to the degree that I could see what that would entail - I guess youre thinking of running subdiv on a MetaCommunity that is/has a ComMatrix as defined above, and use a slightly different AxisArray typealias to store the result?

from diversity.jl.

richardreeve avatar richardreeve commented on May 18, 2024

All I was thinking was that in your case you fix the ComMatrix to have an axis called :Species, and I was hoping to allow it to be any appropriate category. We’ve had to call this a “Type” (somewhat confusingly in Julia), because there was objection to something like a sequence being referred to as a species.

from diversity.jl.

mkborregaard avatar mkborregaard commented on May 18, 2024

Ah, I get it now! @kescobo calls the same thing a "feature". This is the interesting thing regarding generality - e.g. you'd both like to work with strictly abundances, but the features can be anything. Whereas I tend to want to make sure things are really species, because I use the objects to connect to a phylogeny and an ecological network etc.

We can leave the Symbol unspecified for one of the axes to get a more open type, and possibly check the symbol against a Set of allowed symbols in the constructors, but if it's only for syntax rather than because we need to discern dispatch we could also just call it "Feature" and then use getters with the names we want (species(x, y) = feature(x,y) and so forth)

from diversity.jl.

kescobo avatar kescobo commented on May 18, 2024

I rather like this plan (to the extent I understand it). In the end we're talking about operations on tables where the names of rows and columns are useful. The rest is "just" context and readability (though of course context and readability are really important).

from diversity.jl.

mkborregaard avatar mkborregaard commented on May 18, 2024

Both, I've been thinking a lot about how general to make the type (i.e. ":Feature", vs ":Species", vs "Union{:Feature, :Species}"). Many things are a lot more intuitive conceptually when the term is specific (e.g. "richness", "rangesize" strictly pertain to a species), but there are also advantages of generality (e.g. OTU instead of species in microbiology). I am not sure generality is really the best here, though; often operations don't extend as easily conceptually as they operationally (think about the current terrible conflation of niche space with trait space in ecology, which is driven by computational similarities rather than theoretical ones).
So, could you (both) give me some examples where the axes of a CommunityMatrix are not species and sites/samples , but it would be obvious to apply all the same operations? It seems like you've both done such analyses.

from diversity.jl.

richardreeve avatar richardreeve commented on May 18, 2024

Well, the latter is easy. Using the AxisArray type, we're thinking of allowing both categorical sites, and quantitative sites based on quadrats, so the "site" bit would be Axis{:latitude}(0m:100m:100km) and (possibly additionally) Axis{:longitude}(0m:100m:100km) as a third dimension. Or alternatively Axis{:time}(0s:5*365*24*60*60s:20*365*24*60*60s) for 5-yearly samples to calculate temporal changes.

from diversity.jl.

richardreeve avatar richardreeve commented on May 18, 2024

As far as the former is concerned, we use similarity to connect individuals or groups, so our groups can be taxonomic (e.g. sub-species, species, genera, families) or individuals (e.g. tips on a tree, sequences in an alignment), or groups with the same phenotype, etc. We have plenty of examples where we need this.

from diversity.jl.

mkborregaard avatar mkborregaard commented on May 18, 2024

OK, that use of AxisArray is very different from how I approach it in SpatialEcology - where I attach specific spatial objects to keep track of coordinates and always have a 2-d matrix of occurrences. But - essentially - if you plan to define all your diversity functions to dispatch on an AxisMatrix then they should be able to work on SpatialEcology.ComMatrix objects out of the box anyway.

from diversity.jl.

kescobo avatar kescobo commented on May 18, 2024

Hmm - the comment I wrote apparently didn't get posted, sorry for the delay 😞

In my lab, we have several outputs from our standard metagenomics analysis, most of which produce a table where columns are individual samples (often one sample per individual subject, but sometimes multiple longitudinal samples per subject, technical replicates or environmental samples), and rows are any of a number of features:

  • Taxa (this might be a mix of any taxon level from kingdom to strain)
  • Gene names/gene families (GO terms, Unirefs etc)
  • Enzyme pathways
  • Metabolites

For all of these, we have a number of normalization, transformation, distance calculation and plotting steps, some of which are specific to data type, but many of which cross multiple datatypes

from diversity.jl.

richardreeve avatar richardreeve commented on May 18, 2024

@mkborregaard We don't have any definite plans for the quantitative axes at this point. We're still speculating - at the moment, my main concern is to get everything working with 1.0 without disrupting our ongoing use of the code. We've now got Unitful and Phylo fixed, and some preliminary work with AxisArrays and JuliaDB, but we aren't going to fully commit to anything until those are all working with 0.7-pre / 1.0. I can't see that we wouldn't use the variety of different categorical types that @kescobo and I have discussed above, because it's so easy to do it that way, and so essential to our work. As far as the quantitative stuff is concerned though - I am still waiting for that meeting with you about how you see it all working :).

from diversity.jl.

mkborregaard avatar mkborregaard commented on May 18, 2024

You'll not be at the IBS in Evora in March by any chance?

from diversity.jl.

richardreeve avatar richardreeve commented on May 18, 2024

Sadly I didn't know about it, so no...

from diversity.jl.

kescobo avatar kescobo commented on May 18, 2024

Hi guys - I was wondering if a consensus has been achieved at least w/r/t the backend underlying data matrices. Ultimately, I think @mkborregaard's notion of having functions to access data that make sense for underlying data types, even if they're just aliases of other functions, is probably a good one.

But right now, I sort of assume everything starts out in the form of a DataFrame because that's my workflow. I'd like to move towards something a bit more generic (like IterableTables?), at least in terms of getting data in and out of my types. I'm not sure if that's covered by the dependency on Spatial Ecology that M is putting together, or if it's something I can work towards independently.

Thoughts?

from diversity.jl.

mkborregaard avatar mkborregaard commented on May 18, 2024

🎉 💖

from diversity.jl.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.