Currently we have
Dists.logpdf(d::AbstractMeasure, x) = logdensity(d,x)
This is not correct. Say we have d = Normal()
. As @mschauer points out in #93 (comment), what we really want is
Dists.logpdf(d::Normal{()}, x) = logdensity(d, Lebesgue(ℝ), x)
The Lebesgue(ℝ)
will be vary with the measure d
. Note that it is not the base measure used to define Normal
; using our new notation it will be
@parameterized Normal(μ,σ) ≪ (1/sqrt2π) * Lebesgue(ℝ)
which means we'll have
basemeasure(::Normal) = (1/sqrt2π) * Lebesgue(ℝ)
So we need a new concept, some foo
with
foo(::Normal) = Lebesgue(ℝ)
Under ≪
, measures form a preorder. This is transitive and reflexive, but not antisymmetric.
But we can create equivalence classes to fix this. Define a ≃ b
to mean that a≪b
and b≪a
. Then we can choose a representative
measure for each class.
There are some of problems I've come up against in trying this approach:
- Treating
≪
as a global property gets very tricky, because we have to worry about matching the support
- If the representative is defined directly, we can't be sure there will be an automatable way to get to a
logdensity
with respect to it.
- I'm not sure
representative
is the word we want. We might also consider foundation
or ground
if it's more easily understandable as an extension of basemeasure
.
Also relevant here are "primitive" measures. These are things like Lebesgue
and Counting
, measures that don't refer to any combinator or other measure in their definition.
So, here's an idea. We already have this function in utils.jl:
function fix(f, x)
y = f(x)
while x ≠ y
(x,y) = (y, f(y))
end
return y
end
So maybe we should have representative(d) = fix(basemeasure, d)
. This would make it easy to compute log-densities, but would basically be giving up on "global" analysis. The result of computing at a point x
would tell us something about absolute continuity in some neighborhood of x
, where "neighborhood" is in terms of a topology consistent with the measure (so e.g., a neighborhood for Counting measure would be a singleton).
So if we start with ℓ = logdensity(a, b, x)
, then
ℓ == Inf
means a
is singular wrt b
for some neighborhood of x
ℓ == -Inf
means the reverse
ℓ == NaN
means all bets are off (and will sometimes happen, since we can add terms and get ℓ == Inf - Inf
I should point out another important use case. Say mu << a
and nu << b
are two measures with their respective base measures, and we want to compute logdensity(mu, nu, x)
. We need to find a "chain" of log-densities taking us from mu
to nu
. We need to limit the number of logdensity
methods (or it will grow quadratically). So we can just compute
logdensity(mu, nu, x) = logdensity(mu, a, x) + logdensity(a, b, x) - logdensity(nu, b, x)
The logdensity(a, b, x)
might be directly computable, or might have to further delegate to their base measures.