juliamath / measuretheory.jl Goto Github PK

View Code? Open in Web Editor NEW

384.0 20.0 32.0 4.8 MB

"Distributions" that might not add to one.

License: MIT License

Julia 100.00%

julia probabilistic-programming bayesian-inference probability statistics machine-learning

measuretheory.jl's Introduction

MeasureTheory

MeasureTheory.jl is a package for building and reasoning about measures.

Why?

Probabilistic programming and statistical computing are vibrant areas in the development of the Julia programming language, but the underlying infrastructure dramatically predates recent developments. The goal of MeasureTheory.jl is to provide Julia with the right vocabulary and tools for these tasks. In this package we introduce a well-chosen foundational primitives centered around the notion of measure, density and conditional probability with powerful combinators and transforms intended to power and unify work on probabilistic programming and statistical computing within Julia. Check out our JuliaCon 2021 article detailing our ideas for and with this package.

Getting started

To install MeasureTheory.jl, open the Julia Pkg REPL (by typing ] in the standard REPL) and run

pkg> add MeasureTheory

To get an idea of the possibilities offered by this package, go to the documentation.

Coordination and support

For interaction and shorter usage questions, there is the dedicated channel #measuretheory on Julia's Zulip chat julialang.zulipchat.com and the #measuretheory channel on the Julia language Slack chat and for broader discussions Julia's discourse forum.

Development takes place on Github with Github's issue ticker for bug reports and coordination.

We adhere to the community standards set forward by the Julia community.

Support

Stargazers over time

measuretheory.jl's People

Contributors

Stargazers

Watchers

measuretheory.jl's Issues

Density at an infinite sequence

This is a little unintuitive for me. So testvalue, which is I think a point on the sample space of Chain, cannot be passed to the logdensity function without some processing. Would it make more sense for Chain to be a family of measures that must be conditioned on a fixed number of steps (here, 10), in order to uniquely define a measure?

Originally posted by @sethaxen in #137 (comment)

My original response:

I agree it's unintuitive, but I think it's more a difficulty of the math than a problem with the implementation. It's a little like working with irrational numbers. Everything is well-defined, but you can never look at or do computations with the full set of decimal digits. Or even better a Gaussian process - the GP itself is a continuous function, but we can only ever evaluate it on a finite set of values.

Thinking some more about this, given a continuous function f on the reals, we usually compute (approximate, really) f(π) by looking at f on values near π. And we can compute the log-density of a GP on some continuous function using integration.

We can't have an empirically observed infinite sequence, but we can define things like this programmatically. And yet as currently implemented, the density of a Chain on any infinite sequence converges to zero.

This leads to the question, are we really implementing this correctly? @mschauer you've had helpful insights on similar problems, any thoughts on this?

Canonical ordering for NamedTuple keys

@simeonschaub suggested

function canonicalize(nt::NamedTuple{_k,T}) where {_k,T}
    if @generated
        k = collect(_k)
        sp = sortperm(k)
        return quote
            sp = ($(sp...),)
            keys = ($(Meta.quot.(k[sp])...),)
            types = Tuple{$(T.parameters[sp]...)}
            NamedTuple{keys,types}(map(Base.Fix1(getfield, values(nt)), sp))
        end
    else
        sp = sortperm(collect(keys(nt)))
        return NamedTuple{keys(nt)[sp]}(values(nt)[sp])
    end
end

Originally posted by @simeonschaub in #7 (comment)

We should have something like this in MeasureTheory.jl directly or use KeywordDispatch.jl. It looks like this shouldn't introduce much overhead, but let's keep an eye out for that anyway.

Suggested Parametrizations

A lot of distributions have reparametrizations of the form μ, ν where μ is the mean or scale, while ν represents the degrees of freedom. Examples are the Beta distribution written in terms of μ=α/(α+β) and ν=α+β, or the inverse gamma rewritten as the scaled inverse chi squared distribution (and similarly for gamma -> scaled chi squared distribution).

Another kind of reparametrization that would be extremely useful are the orthogonal parametrizations. In some cases, it's possible to rewrite a distribution of several variables as the product of two distributions, each of which only depends on one of the parameters. For instance, the gamma distribution can be rewritten in terms of the log of the geometric mean and the log of a scale parameter. Orthogonal parametrizations can help speed up some inference algorithms, can make setting priors or building models easier, and can help with communicating/visualizing/interpreting models by providing parameters that are independent (or approximately so).

Add more `distproxy`s

Rather than trying to rebuild all functionality from Distributions.jl, we're first focusing on reimplementing logdensity (logpdf in Distributions), and delegating most other functions to the current Distributions implementations.

So for example, we have

distproxy(d::Normal{(:μ, :σ)}) = Dists.Normal(d.μ, d.σ)

This makes some functions in Distributions.jl available through distproxy.jl:

PROXIES = Dict(
    :Distributions => [
        :mean
        :std
        :var
        :entropy
        :cdf
        :quantile
        ],
    :MonteCarloMeasurements => [
        :Particles
    ]
)

for m in keys(PROXIES)
    for f in PROXIES[m]
        @eval begin
            import $m: $f
            export $f
            $m.$f(d::AbstractMeasure, args...) = $m.$f(MeasureTheory.distproxy(d), args...)
        end
    end
end

So for example, without ever defining cdf explicitly, we get

julia> Dists.cdf(Normal(2,5),3.1)
0.5870644226482147

julia> @which Dists.cdf(Normal(2,5),3.1)
cdf(d::AbstractMeasure, args...) in MeasureTheory at /home/chad/git/MeasureTheory.jl/src/distproxy.jl:21

We need a distproxy for every parameterization. For example, we should have something like

distproxy(d::Normal{(:μ, :logσ)}) = Dists.Normal(d.μ, exp(d.logσ))

Issues using LKJL

I came across some issues with MeasureTheory.LKJL while trying to use it with Soss. I'm posting the issue here because I'm not 100% sure which issues need to be resolved here or there.

using Soss, MeasureTheory, SampleChainsDynamicHMC

m = @model N begin
    σ ~ HalfNormal(1) |> iid(N)
    L ~ LKJL(N, 2)
    ŷ ~ Normal() |> iid(N)
    y = σ .* (L * ŷ)  # same as y ~ MvNormal(zeros(N), σ .* (L * L') .* σ')
end

Drawing random samples does not work.

julia> rand(m(4))
ERROR: MethodError: no method matching distproxy(::LKJL{4, (:η,), Tuple{Int64}})

Distributions' LKJ isn't equivalent to LKJL, so we instead overload rand ourselves:

julia> using LinearAlgebra, Random

julia> function Base.rand(rng::AbstractRNG, ::Type, d::LKJL{k}) where {k}
           return cholesky(rand(rng, MeasureTheory.Dists.LKJ(k, d.η))).L
       end;

julia> rand(LKJL(4, 2))  # works!
4×4 LowerTriangular{Float64, Matrix{Float64}}:
 1.0         ⋅          ⋅         ⋅ 
 0.476038   0.879425    ⋅         ⋅ 
 0.140615  -0.272203   0.95191    ⋅ 
 0.363825   0.384274  -0.341334  0.776824

julia> pred = rand(m(4))  # also works!
(y = [0.44037889402215297, 1.5938652701305567, -0.23702427205378396, -0.5424312905721139], σ = [1.1175273174818694, 1.7836751630330654, 1.3148335815282792, 0.3861023795641398], ŷ = [0.3940654399522521, 0.8021362778335628, 0.3654790805746502, -1.595705932814744], L = [1.0 0.0 0.0 0.0; 0.4261101513982992 0.9046712877478309 0.0 0.0; 0.2419199606694756 -0.6653662683891962 0.7062311671963476 0.0; -0.07132419227849153 -0.057620246988085724 0.37930698942361735 0.920716554921896])

julia> post = sample(DynamicHMCChain, m(4) | (y=pred.y,), 1_000)
ERROR: StackOverflowError:
Stacktrace:
 [1] xform(μ::Pushforward{TransformVariables.CorrCholeskyFactor, ProductMeasure{MappedArrays.ReadonlyMappedArray{Lebesgue{ℝ}, 1, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, MeasureTheory.var"#24#25"{Lebesgue{ℝ}}}}}, _data::NamedTuple{(), Tuple{}})
   @ Soss ~/.julia/packages/Soss/9Knnt/src/primitives/xform.jl:134
 [2] xform(μ::Pushforward{TransformVariables.CorrCholeskyFactor, ProductMeasure{MappedArrays.ReadonlyMappedArray{Lebesgue{ℝ}, 1, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, MeasureTheory.var"#24#25"{Lebesgue{ℝ}}}}}) (repeats 79983 times)
   @ Soss ~/.julia/packages/Soss/9Knnt/src/primitives/xform.jl:135

The issue is xform calls representative on its argument and then calls xform on that, but after calling representative on LKJL, further calls to representative are idempotent, so we overflow.
MeasureTheory defines TransformVariables.as for LKJL, so we just overload xform to point to that:

julia> using Soss: TransformVariables

julia> Soss.xform(d::LKJL, _data::NamedTuple=NamedTuple()) = TransformVariables.as(d);

julia> xform(LKJL(4, 2))
TransformVariables.CorrCholeskyFactor(4)

julia> post = sample(DynamicHMCChain, m(4) | (y=pred.y,), 1_000)
4000-element MultiChain with 4 chains and schema (σ = Vector{Float64}, ŷ = Vector{Float64}, L = UpperTriangular{Float64, Matrix{Float64}})
(σ = [0.85±0.66, 0.81±0.71, 0.84±0.51, 0.87±0.64], ŷ = [-0.01±0.85, 0.1±1.3, -0.21±1.1, -0.02±1.1], L = [1.0±0.0 0.007±0.33 0.049±0.35 -0.0±0.28; 0.0±0.0 0.942±0.078 0.039±0.3 0.02±0.34; 0.0±0.0 0.0±0.0 0.877±0.11 0.02±0.43; 0.0±0.0 0.0±0.0 0.0±0.0 0.77±0.17])

But while we expected L to be LowerTriangular, it's actually UpperTriangular:

julia> post.L[1]
4×4 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  0.179572  -0.623481  -0.0554051
  ⋅   0.983745  -0.121416   0.15332
  ⋅    ⋅         0.772353   0.225203
  ⋅    ⋅          ⋅         0.960576

This happens because even though LKJL is adapted from AltDistributions.jl, and this is in the same ecosystem as TransformVariables.jl, CorrCholeskyFactor transforms to an UpperTriangular matrix, while LKJL works with a LowerTriangular matrix.

Long term: hook up Jaynes to MeasureTheory

Right now I’m using Distributions as my main source of utility for reasoning about randomness, sampling, etc.

As this repository progresses, I can swap in things from MeasureTheory - this will provide integration to Gen-like inference (as Jaynes inherits from the GFI) for testing some of the performance goals discussed in the community call.

Propagate marginals correctly for products of Lebesgue measures

We have this:

julia> d = Lebesgue(ℝ)^5
For(Lebesgue, Fill((), 5))

julia> marginals(d)
5-element mappedarray(Lebesgue, ::FillArrays.Fill{Tuple{}, 1, Tuple{Base.OneTo{Int64}}}) with eltype Lebesgue:
 Lebesgue(())
 Lebesgue(())
 Lebesgue(())
 Lebesgue(())
 Lebesgue(())

I'm guessing each marginal should be a Lebesgue(ℝ)? Because the default implementation of testvalue maps over the marginals, it won't know what to do when it hits a Lebesgue(()).

Originally posted by @sethaxen in #134 (comment)

name clash with https://github.com/JuliaGraphics/Measures.jl

might consider calling this package something else

Order statistics and ordered samples

For linearly-ordered domains (especially ℝ), we often need to compute measures like

the rth order statistic from a sample of size n
an ordered sample of size n

These are very different cases. The first case is a single sample, the logdensity is given by

For example, it would be convenient to be able to write

OrderStatistic(100, 5, Normal())

For the second, the sample is of size n. Here we probably want something like what Stan does,
https://mc-stan.org/docs/2_26/reference-manual/ordered-vector.html

Normalizing a measure

In JuliaManifolds/ManifoldMeasures.jl#7 we were discussing that we need a way to map an (unnormalized) Hausdorff measure to the corresponding probability measure by normalizing. This seems general enough that it could be defined here.

What about something like this?

LinearAlgebra.normalize(μ::AbstractMeasure) = inv(total_mass(μ)) * μ
unit() = true  # just a function to use for dispatch.
total_mass(μ::AbstractMeasure) = ∫(unit, μ)
total_mass(μ::AbstractWeightedMeasure) = exp(μ.logweight) * total_mass(μ.base)
is_probability_measure(μ::AbstractMeasure) = isone(total_mass(μ))

Perhaps there is a more measure-theoretic term than "normalization", and we should not then overload normalize. In the case of a Hausdorff measure, total_mass would compute the volume/area of the manifold in some embedded metric space.

It might be preferable to define log_total_mass instead.

Normalization constants

This is a general question/concern, illustrated through a special case. So, starting with that special case, ...

In #101, we're adding an "LKJL" measure. This is adapted from https://github.com/tpapp/AltDistributions.jl, and looks like this:

using LinearAlgebra
using Tullio

function logdensity(d::LKJL{k, (:η,), T}, L::Union{LinearAlgebra.AbstractTriangular, Diagonal}) where {k,T}
    # Note: https://github.com/cscherrer/MeasureTheory.jl/issues/100#issuecomment-852428192
    c = k + 2(d.η - 1)
    @tullio s = (c - i) * log(L[i,i])
    return s
end

The simplicity here is really appealing; this ought to be very fast to compute. The normalization constant is the same as for the LKJ distribution. Distributions.jl has several functions for computing this, depending whether we need the general or "uniform" (eta==1) case. For context, a comment in that file says,

If f(R; n) = c₀ |R|ⁿ⁻¹, these give log(1 / c₀).

The computation looks like this:

function lkj_onion_loginvconst(d::Integer, η::Real) 
            #  Equation (17) in LKJ (2009 JMA)
            sumlogs = zero(η)
            for k in 2:d - 1
                sumlogs += 0.5k*logπ + loggamma(η + 0.5(d - 1 - k))
            end
            α = η + 0.5d - 1
            loginvconst = (2η + d - 3)*logtwo + logbeta(α, α) + sumlogs - (d - 2) * loggamma(η + 0.5(d - 1))
            return loginvconst
end

It would be nice if this depended only on the dimensionality d and not on η. Since we would write this as LKJL{d}((η=η,)), that would allow the normalization constant to depend only on the type, and not its value.

Unfortunately, this is not the case. This means we'll need to split this into two terms. one depending only on d, and the second including η as well. The former can be absorbed into the base measure, which should speed things up a bit.

The actual question

It's very common for η to be fixed, usually at η=2.0. In that case, it would be better to have all of lkj_onion_loginvconst in the base measure, which may never need to be computed at all.

One possibility is to drop the standard of the base measure depending only on the type. In some cases, for example in superposition, we'll need to be able to compute a common base measure across several measures. Would something like that help us here?

Distributions.logpdf and base measure fixpoints

Currently we have

Dists.logpdf(d::AbstractMeasure, x) = logdensity(d,x)

This is not correct. Say we have d = Normal(). As @mschauer points out in #93 (comment), what we really want is

Dists.logpdf(d::Normal{()}, x) = logdensity(d, Lebesgue(ℝ), x)

The Lebesgue(ℝ) will be vary with the measure d. Note that it is not the base measure used to define Normal; using our new notation it will be

@parameterized Normal(μ,σ) ≪ (1/sqrt2π) * Lebesgue(ℝ)

which means we'll have

basemeasure(::Normal) = (1/sqrt2π) * Lebesgue(ℝ)

So we need a new concept, some foo with

foo(::Normal) = Lebesgue(ℝ)

Under ≪, measures form a preorder. This is transitive and reflexive, but not antisymmetric.

But we can create equivalence classes to fix this. Define a ≃ b to mean that a≪b and b≪a. Then we can choose a representative measure for each class.

There are some of problems I've come up against in trying this approach:

Treating ≪ as a global property gets very tricky, because we have to worry about matching the support
If the representative is defined directly, we can't be sure there will be an automatable way to get to a logdensity with respect to it.
I'm not sure representative is the word we want. We might also consider foundation or ground if it's more easily understandable as an extension of basemeasure.

Also relevant here are "primitive" measures. These are things like Lebesgue and Counting, measures that don't refer to any combinator or other measure in their definition.

So, here's an idea. We already have this function in utils.jl:

function fix(f, x)
    y = f(x)
    while x ≠ y
        (x,y) = (y, f(y))
    end

    return y
end

So maybe we should have representative(d) = fix(basemeasure, d). This would make it easy to compute log-densities, but would basically be giving up on "global" analysis. The result of computing at a point x would tell us something about absolute continuity in some neighborhood of x, where "neighborhood" is in terms of a topology consistent with the measure (so e.g., a neighborhood for Counting measure would be a singleton).

So if we start with ℓ = logdensity(a, b, x), then

ℓ == Inf means a is singular wrt b for some neighborhood of x
ℓ == -Inf means the reverse
ℓ == NaN means all bets are off (and will sometimes happen, since we can add terms and get ℓ == Inf - Inf

I should point out another important use case. Say mu << a and nu << b are two measures with their respective base measures, and we want to compute logdensity(mu, nu, x). We need to find a "chain" of log-densities taking us from mu to nu. We need to limit the number of logdensity methods (or it will grow quadratically). So we can just compute

logdensity(mu, nu, x) = logdensity(mu, a, x) + logdensity(a, b, x) - logdensity(nu, b, x)

The logdensity(a, b, x) might be directly computable, or might have to further delegate to their base measures.

testvalue stackoverflows when called on LKJCholesky

Currently this happens.

julia> testvalue(LKJCholesky(10, 2.0))
ERROR: StackOverflowError:
Stacktrace:
 [1] testvalue(μ::Lebesgue{()}) (repeats 79984 times)
   @ MeasureTheory ~/.julia/packages/MeasureTheory/cKsEt/src/utils.jl:38

This causes any Soss models that use LKJCholesky to fail.

Adopt style guide

It would be nice to early on adopt an existing style guide so the conventions are well established. I use Invenia's Blue Style for my repos. JuliaFormatter.jl's defaults match Blue Style pretty well.

Wish list

Measures from scratch

Specify equivalence class (wrt absolute continuity)
Density
Probability measures
Random sampling
Parameterized measures
Measure of a set (for some extensible definition of "set")

Composition of Measures

Superposition
Scaling
Product measure
Composition of parameterized measures ("mixture measures", formally a monadic bind)
Reparameterization
Push-forwards (or is it "pushes-forward"?)

Extensibility

Easily make measures from other packages a Measure

Computations

Check for absolute continuity
Radon-Nikodym derivatives
Lebesgue decomposition

MeasureTheory.jl as a tool to learn Measure Theory

This is more an idea/invitation than an issue. I have struggled for a long time to learn Measure Theory and never got much beyond the first few chapters of many MT books... I think that this package could be a vehicle to explain and teach the basics of measures to an audience interested in using it in their day to day work more than in learning its foundations. So I really hope that someone will write a tutorial with examples. For example, this package should allow to do regression on a mixture of discrete and continuous random variables, correct? That is actually a problem I worked on some time ago...

Extend `testvalue` with a `::Type` argument

It would be useful to test somehow that logdensity does not needlessly promote. e.g. if the measure's parameters have a real eltype of Float32, and so does the test value, then logdensity should not promote to Float64.

Originally posted by @sethaxen in #137 (comment)

We should add methods so that for any measure d,

logdensity(d, testvalue(d)) isa Float64
logdensity(d, testvalue(T, d)) isa T

Ways to build measures

How to get the "standard" logdensity?

Sometimes I am actually bothered by the fact that the base measure of, say, Normal() is not the Lebesgue measure. In case I need the traditional logdensity, what do I do?

sigma algebras?

depending on how you wanna use mesures on the reals, it seems to me that something like having measures provide a sorta of "valuation" function on the elements of their sigma algebras might be useful? (or at least the generators of the sigma algebras etc etc, lets ignore filtrations )

eg, for the Real line (and approximations thereof), the sigma algebra generator set would be intervals (open or closed or clopen etc), because we can define the lesbegue/riemann measure on these to be the length ?

would this allow eg, talking about dirac delta as a measure/distribution?

Export `Half*` measures

This should be a very quick fix, just adding an export line in the macro

FillArrays much slower than MappedArrays

This is pretty strange:

julia> d_fill = Normal(2.0,5.0) ^ 20
Normal(μ = 2.0, σ = 5.0) ^ (20,)

julia> d_mapped = For(1:20) do i Normal(2.0, 5.0) end
For(#15, 1:20)

julia> x = rand(d_fill);

julia> @btime logdensity($d_fill, $x)
  132.057 ns (0 allocations: 0 bytes)
-38.3707078501479

julia> @btime logdensity($d_mapped, $x)
  14.346 ns (0 allocations: 0 bytes)
-38.370707850147895

These both call the same code:

@inline function MeasureTheory.logdensity(d::ProductMeasure, x)
    @boundscheck size(d.data) == size(x) || throw(BoundsError)

    s = 0.0
    Δs(j) = @inbounds logdensity(d.data[j], x[j])

    @inbounds @simd for j in eachindex(x)
        s += Δs(j)
    end
    s
end

The only difference is is the payload:

julia> d_fill.data
20-element Fill{Normal{(:μ, :σ), Tuple{Float64, Float64}}}: entries equal to Normal(μ = 2.0, σ = 5.0)

julia> d_mapped.data
20-element mappedarray(i->Main.Normal(2.0, 5.0), ::UnitRange{Int64}) with eltype Normal{(:μ, :σ), Tuple{Float64, Float64}}:
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)
 Normal(μ = 2.0, σ = 5.0)

This is really surprising to me. If anything, I'd expect FillArrays to be a little quicker, since the value doesn't change. I tried specializing the Δs(j) line for FillArrays, but it doesn't seem to help.

Any idea what's going on here?

Implement the sampler interface?

If we implement the Random.Sampler interface, then we can draw IID samples from the distributions: https://docs.julialang.org/en/v1/stdlib/Random/#Generating-random-values-of-custom-types. Distributions tends to make scalar and array distributions return Arrays, but should we instead return a vector of draws, e.g. a vector of arrays?

Type hierarchy

There are lots of different kinds of measures, as well as extensions like signed measures. How should these be organized?

Current idea is to use traits for this, since this allows introducing new types after the fact

Code redundancy in parameterizations

We currently have logdensity(::Normal) as

function logdensity(d::Normal{P} , x::X) where {P <: NamedTuple{(:μ, :σ)}, X}    
    return - log(d.par.σ)  - (x - d.par.μ)^2 / (2 * d.par.σ^2)
end

function logdensity(d::Normal{EmptyNamedTuple} , x::X) where {X}  
    return - x^2 / 2 
end

There's a fair amount of redundancy here, and really no need for it. And many distributions have a (:μ, :σ) variant indicating a location and scale, so this is really not a one-off. In general, these log-likelihoods look like

ℓ(x; μ,σ) = ℓ((x-μ)/σ; 0,1) - log(σ)

Rather than write out each case, we should be able to only add the standard case, and have a macro add the location-scale parameterizations for us.

Also, there's a very easy way to get many of these log-densities:
https://gist.github.com/cscherrer/47f0fc7597b4ffc11186d54cc4d8e577

Absolute continuity

https://github.com/cscherrer/Measures.jl/blob/84a22ad584e29670b9685e38e3a7bf52b8cff691/src/basemeasures.jl#L22

If the density is zero somewhere, then even if the base measures are the same, the measures are not equivalent.

Representing random sequences

Say you call rand on a stochastic sequence. It might be infinite, so you can't just call rand all the way down. But if you leave it as a thunk, you lose reproducibility.

The best way I can think of to get around this gracefully is to have a new struct to hold the instantiation. Something like

struct SequenceSample{R, S}
    rng::R
    seq::S
end

Also, iterator types get a little funny. One possibility is to mostly organize around Base.Generator. But there are lots of details for all of this, and I think it could get tricky.

Change Normal{P,X} to Normal{X,P}

@phipsgabler suggested this, so users can write, e.g.,

Normal{Float64}

instead of

Normal{P,Float64} where {P}

Any objections @mschauer @simeonschaub @femtomc ?

Use `static_hasmethod` and `static_methods` from Tricks.jl

It's generally good to do things statically when possible. We might be able to use these to help performance, in particular for things like computing Radon-Nikodym derivatives. These really should depend on having a << in place. We should be able to use this to check for the existence of a method without adding any overhead.

https://github.com/oxinabox/Tricks.jl/

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Representing and using point processes

Hi @cscherrer @mschauer
Following up on our Slack discussion, here's an issue about the implementation of point process models and algorithms. Here are our main questions sofar:

What is a point process realization? A tuple (locations, marks) or a measure?
How do we represent a point process?
Where would I put this code?
Could it interface well with Distributions.jl inspite of the more general sample type?

Overloading constructors

I know this isn't part of this PR, but overloading constructors to construct something else can cause problems, the simplest being that something like StudentT(0, 1, 30) isa StudentT is false.

Originally posted by @sethaxen in #156 (comment)

My response:

Yeah, I agree that's a little weird. But the upside is pretty huge. This approach allows us to fine-tune code for Affine, and have that same code work for multivariate-anything. So implementing MvNormal gives us MvStudentT etc for free.

To me it seems worth it, but if you disagree we should talk through it

Let's discuss...

The Base Measure Problem (AISTATS paper)

@theogf showed me this paper:
http://proceedings.mlr.press/v130/radul21a/radul21a.pdf

We should compare the approach with ours, make sure we can match their results, and adjust as needed if not

Categorical and Gamma measures

Hey, it's me again 👋
In order to replace the Distributions.jl dependency in PointProcesses.jl with MeasureTheory.jl, I would need local implementations of the Gamma and Categorical distributions. Is that on the wish list somehow?
I'm not very confident with the macros defining ParametrizedMeasures, so I could give it a shot in a PR but I'm not sure it would land on target.

Specifying "outer" type for `rand`

On the Julia Slack, @sethaxen asked

When calling rand on a measure, we need some information about what kind of object should be returned. e.g. how would a user request that rand(::Hausdorff(Sphere(2))) return some SphericalCoordinateVector type or an SVector{3} instead of a boring Vector?

Currently this requires storing a point of the intended type in the measure,
Hausdorff(Sphere(2), randn(3)) so the rand function can call similar. But this is a little clunky if one wants to use this in a PPL. Does MeasureTheory have a solution here?

Great question, I've been fighting with this for a while. Some ideas...

First, we took @devmotion’s suggestion of including an optional float type in the rand call. This propagates downward, until eventually influencing the call to rand(::Type, ::AbstractRNG). This is a sort of "inner type".

One option would be to have an optional first argument for the "outer type", the constructor for the type to be returned. One issue here is that if the result is nested, say a vector of vectors, it can be awkward to describe in this way. Doesn't seem to compose well.
Next, we could base this entirely off the parameters. If you have an MvNormal where the mean is a StaticArray, maybe you should get another in return. But this is less flexible, and doesn't allow for cases where there's no parameter with the same structure. Just checked again, and see your concern on this as well.
Another concern is allocation. So maybe we could have the low-level work entirely in terms of rand! Then each user-level rand would involve constructing an undef and then calling rand on the result. I kind of like this one, but for composability we'd also need a way to specify an unboxed array element. This could also maybe work like (1). So far I kind of like this one.
Another idea I've played with is to not allocate at all. If samples are sequential, maybe the returned result should be a generator? Then, like (3), the top level could allocate and then traverse the structure and generator together, writing as it goes. This could be ok, but it forces sequential access, and I'm not sure if there's overhead in a traversal like this.
Finally, we could do something similar, but always use a StrideArray. This would make it easy to avoid allocation altogether when we know the size statically, and we could copy it into a different structure at the end, similar to (4). But this would require lots of inlining and GC.@preserve s throughout the code, and I think there could be big problems for very large results.

These are all just ideas, but so far (3) seems the best to me.

@measure combinator - macros, TypeVars, and binops

I think it's close to working, but...

julia> using StatsFuns

julia> @measure Normal(μ,σ) ≃ (1/sqrt2π) * Lebesgue(X)
ERROR: UndefVarError: X not defined
Stacktrace:
 [1] (::Type{Lebesgue{X}})() at /home/chad/git/Measures.jl/src/basemeasures/lebesgue.jl:4
 [2] Lebesgue(::TypeVar) at /home/chad/git/Measures.jl/src/basemeasures/lebesgue.jl:6
 [3] top-level scope at /home/chad/git/Measures.jl/src/macros.jl:41

[1] and [2] here are

struct Lebesgue{X} end
Lebesgue(X) = Lebesgue{X}()

Any suggestions on getting this to work properly? Here's the code the macro currently generates:

julia> using MacroTools

julia> (@macroexpand @measure Normal(μ,σ) ≃ (1/sqrt2π) * Lebesgue(X)) |> MacroTools.prettify
quote
    struct Normal{P, X} <: MeasureTheory.AbstractMeasure{X}
        par::P
    end
    function Normal(nt::NamedTuple)
        P = typeof(nt)
        return Normal{P, eltype(Normal{P})}(nt)
    end
    Normal(; kwargs...) = Normal((; kwargs...))
    (baseMeasure(μ::Normal{P, X}) where {P, X}) = (1 / sqrt2π) * Lebesgue(X)
    Normal(μ, σ) = Normal(; μ, σ)
    ((::Normal{P, X} ≪ ::typeof((1 / sqrt2π) * Lebesgue(X))) where {P, X}) = true
    ::typeof((1 / sqrt2π) * Lebesgue(X)) ≪ ::Normal{P, X} where {P, X} = true
end

Measure combinators

0.11 Todo list

We now have MeasureBase split away, and we're just about ready for the next MeasureTheory release. We still need to ramp up on tests and docs, but since these are ongoing let's focus on "better than 0.10" to start.

Running the Soss tests, the big issue I'm seeing is around as, which is missing for Affine and Normal. So I need to add those methods and tests.

@mschauer , could you check tests for Mitosis on MeasureTheory#master? Hopefully we can move this ahead soon.

Conditionally dependencies with strict version bounds

I see that InfiniteArrays only works for Julia 1.5. Is this dependency required, or can it be conditionally loaded with Requires so offer support for older Julia versions? Are there other dependencies with strict bounds that can be relaxed?

UnivariateFinite as a measure

Would anyone have time to dig into this?
JuliaAI/MLJBase.jl#504

Design of `rand`

Last week I had some discussion in a PR to Distributions about the design of rand (JuliaStats/Distributions.jl#1262 (comment)), and I think it applies to MeasureTheory as well.

TL;DR: It is problematic to prescribe the type of samples of a distribution, both for users and internally. However, similar to logdensity, the type of the samples follows naturally if (1) the type of the basic samples from the RNG obtained with rand, randn, or randexp, and (2) the parameters of the distribution of interest are known. Thus I propose to not prescribe the type of the samples of the distribution but the type of the basic samples from the RNG with type T in rand(::AbstractRNG, ::Type{T}, ::Distribution).

Partly copied from JuliaStats/Distributions.jl#1262:

Basically every sampling procedure at some point starts sampling with rand, randexp, or randn in Random from a uniform distribution in [0, 1), the exponential distribution with rate 1 on R_{>0}, or the standard normal distribution on R, respectively. The samples from the distribution of interest are obtained by transforming these most basic samples (ie. basically the distribution of interest is a push-forward of the distribution of these basic samples), and the type of the random variates follows automatically from the type of these basic samples and the function that transforms them and involves the parameters of the distribution.

My suggestion would be to allow to specify the type of exactly (and only) these base variates, i.e., to change such "basic calls" to rand(rng, T), randn(rng, T) etc. where T can be specified by the user. The type of the random variates of the distribution of interest would then be determined both by T, the parameters of the distribution, and the transformations needed to turn the basic samples into a sample from the distribution.

More concretely, with a default of T = Float64 one would obtain

julia> typeof(rand(Gamma(0.5f0))) # = typeof(rand(Float64, Gamma(0.5f0)))
Float64

julia> typeof(rand(Gamma(0.5))) # = typeof(rand(Float64, Gamma(0.5)))
Float64

julia> typeof(rand(Gamma(big(0.5)))) # = typeof(rand(Float64, Gamma(big(0.5))))
BigFloat

julia> typeof(rand(Float32, Gamma(0.5f0)))
Float32

julia> typeof(rand(Float32, Gamma(0.5)))
Float64

julia> typeof(rand(Float32, Gamma(big(0.5))))
BigFloat

julia> typeof(rand(BigFloat, Gamma(0.5f0))) # requires https://github.com/JuliaLang/julia/pull/35111
BigFloat

julia> typeof(rand(BigFloat, Gamma(0.5))) # requires https://github.com/JuliaLang/julia/pull/35111
BigFloat

julia> typeof(rand(BigFloat, Gamma(big(0.5)))) # requires https://github.com/JuliaLang/julia/pull/35111
BigFloat

An internal detail:
Probably one would want to promote T to the precision of the parameters to ensure that the precision of the underlying draws is sufficiently high, i.e., with parameters of BigFloat the samples should be based on draws of type BigFloat in all cases.

Extend `testvalue` with a `::Type` argument

If this is a universal requirement, can you also document somewhere, perhaps on the definition of testvalue that it should produce a point from the sample space of the measure such that evaluating logdensity on it produces a Float64?

Originally posted by @sethaxen in #137 (comment)

We should add methods to testvalue so that for any measure d,

logdensity(d, testvalue(d)) isa Float64
logdensity(d, testvalue(T, d) isa T

MultivariateMoments.jl

We need to see how this can help:

https://github.com/JuliaAlgebra/MultivariateMoments.jl

In particular, it includes a weighted Dirac measure.

Absorb, integrate or replace GaussianDistributions

Adding to the public todo-list, reworking and absorb, integrate or replace https://github.com/mschauer/GaussianDistributions.jl

Markov kernel

I will be in need of something which captures linear and nonlinear Markov kernels.
Essentially parametrised measures m(x) which for fix x are just probability measures.

struct GaussKernel{T}
    op::T
end
(k::GaussKernel)(x) = GaussMeasure(k.op(x))

with

k = GaussKernel(x -> (mu = 2*x + 1, Sigma = 1.0))
k(1.0) # GaussMeasure((mu = 3.0, Sigma = 1.0))

What do you think?

Fitting, moments, and more

Has there been discussion on if/how fitting of measures from draws should be implemented? I'm also wondering if/how estimators of moments from draws or exact moments when known should be included. In most cases on manifolds, these are not known, but for cases where they are, it would be nice to include them.

@measure combinator

We already have a start on this, but I think we can do better. Something like

@measure <name> <absolute continuity relation> [scale] <base measure>

Maybe most confusing here are that

absolute continuity relation can be ≪ or ≃
scale is optional

For example,

@measure Normal ≃ (twoπ/sqrt2) Lebesgue

should generate (something like)

struct Normal{P, X} <: AbstractMeasure{X}
    par::P
end

function Normal(nt::NamedTuple)
    P = typeof(nt)
    return Normal{P, eltype(Normal{P})}
end
    
Normal(; kwargs...) = Normal((; kwargs...))
    
baseMeasure(μ::Normal{P, X}) where {P, X} = (twoπ/sqrt2)*Lebesgue(X)
            
:≪(::Normal{P, X}, ::Lebesgue{X}) where {P, X}) = true

# This method is only added if user specifies ≃
:≪(::Lebesgue{P, X}, ::Normal{X}) where {P, X}) = true

We'd then specify the logdensity as

function logdensity(d::Normal{P} , x::X) where {P <: NamedTuple{(:μ, :σ)}, X}    
    return - log(d.par.σ)  - (x - d.par.μ)^2 / (2 * d.par.σ^2)
end

# Standard normal

function logdensity(d::Normal{EmptyNamedTuple,X} , x::X) where {X}    
    return - x^2 / 2 
end

This definition (with no base measure) would be wrt the default base measure, here (twoπ/sqrt2)*Lebesgue(X).

We'll also have a method logdensity(μ::Measure, ν::Measure) that returns the Radon-Nikodym derivative as a function, assuming μ≪ν, and fails otherwise.

`logdensity`: Should new measures use 2-arg or 3-arg method?

In the current definition of a standard normal (let's use this as a running example) we have

logdensity(d::Normal{()} , x) = - x^2 / 2

Implicit in this is that this log-density is with respect to the base measure

basemeasure(::Normal{()}) = (1/sqrt2π) * Lebesgue(ℝ)

If we want to compute the log-density wrt a different base measure, we can use the three-argument form,

logdensity(μ::AbstractMeasure, ν::AbstractMeasure, x)

I think the implementation of this will change, for example to something like

function logdensity(μ::AbstractMeasure, ν::AbstractMeasure, x)
    a = basemeasure(μ)
    b = basemeasure(ν)

    ℓ = logdensity(μ, x)        # implicitly wrt a
    ℓ += logdensity(a, b, x)    # total is now logdensity(μ, b, x)
    ℓ -= logdensity(ν, x)       # impicitly wrt b, yielding final total
    return ℓ
end

In general this might be recursive, because of logdensity(a, b, x). Of course we can rework this to avoid the recursion, this is more to define the behavior than anything.

Alternatively, we could start by defining the three-argument form (suggested by @mschauer in #93 (comment)), and have the two-argument form as an extra convenience.

There are a few things I like about the current approach:

It simplifies defining a logdensity, and reduces the chance of ending up with lots of different methods that may be inconsistent
It optimizes for the most common use case, where users can work with a measure similarly to a distribution

Moritz, what do advantages do you see of building things on the three-argument form? Anyone else have thought on this?

Overload LinearAlgebra.cross for ProductMeasure

This package defines ×(μ::AbstractMeasure{X}, ν::AbstractMeasure{Y}). This will collide with LinearAlgebra.cross, which is aliased to ×, which is exported. The solution we used in Manifolds.jl was to overload LinearAlgebra.cross and then re-export ×.

HalfNormal constructor is broken

MWE:

julia> using MeasureTheory

julia> HalfNormal(1)
ERROR: MethodError: no method matching HalfNormal(; σ=1)
Closest candidates are:
  HalfNormal(::Any) at /Users/sethaxen/.julia/packages/MeasureTheory/7aDVP/src/parameterized/normal.jl:130 got unsupported keyword argument "σ"
Stacktrace:
 [1] HalfNormal(σ::Int64)
   @ MeasureTheory ~/.julia/packages/MeasureTheory/7aDVP/src/parameterized/normal.jl:130
 [2] top-level scope
   @ REPL[4]:1

juliamath / measuretheory.jl Goto Github PK

measuretheory.jl's Introduction

MeasureTheory

Why?

Getting started

Coordination and support

Support

Stargazers over time

measuretheory.jl's People

Contributors

Stargazers

Watchers

Forkers

measuretheory.jl's Issues

The actual question

Measures from scratch

Composition of Measures

Extensibility

Computations

Recommend Projects

Recommend Topics

Recommend Org