Giter Site home page Giter Site logo

bkamins / categoricaldistributions.jl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from juliaai/categoricaldistributions.jl

0.0 1.0 0.0 98 KB

Providing probability distributions and non-negative measures over finite sets, whose elements are labelled.

License: MIT License

Julia 100.00%

categoricaldistributions.jl's Introduction

CategoricalDistributions.jl

Probability distributions and measures for finite sample spaces whose elements are labeled (consist of the class pool of a CategoricalArray).

Designed for performance in machine learning applications. For example, probabilistic classifiers in MLJ typically predict the UnivariateFiniteVector objects defined in this package.

For probability distributions over integers see the Distributions.jl package, whose methods the current package extends.

Linux Coverage
Build Status Coverage

Installation

using Pkg
Pkg.add("CategoricalDistributions")

Basic usage

The sample space of the UnivariateFinite distributions provided by this package is the class pool of a CategoricalArray:

using CategoricalDistributions
using CategoricalArrays
import Distributions
data = ["no", "yes", "no", "maybe", "maybe", "no",
       "maybe", "no", "maybe"] |> categorical
julia> d = Distributions.fit(UnivariateFinite, data)
               UnivariateFinite{Multiclass{3}}
         ┌                                        ┐
   maybe ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.4
      no ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.5
     yes ┤■■■■■■■ 0.1
         └                                        ┘
julia> pdf(d, "no")
0.5

julia> mode(d)
CategoricalValue{String, UInt32} "no"

A UnivariateFinite distribution can also be constructed directly from a probability vector:

julia> d2 = UnivariateFinite(["no", "yes"], [0.15, 0.85], pool=data)
             UnivariateFinite{Multiclass{3}}
       ┌                                        ┐
    no ┤■■■■■■ 0.15
   yes ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.85
       └                                        ┘

A UnivariateFinite distribution tracks all classes in the pool:

levels(d2)
3-element Vector{String}:
 "maybe"
 "no"
 "yes"

julia> pdf(d2, "maybe")
0.0

julia> pdf(d2, "okay")
ERROR: DomainError with Value okay not in pool. :

Arrays of UnivariateFinite distributions are defined using the same constructor. Broadcasting methods, such as pdf, are optimized for such arrays:

julia> v = UnivariateFinite(["no", "yes"], [0.1, 0.2, 0.3, 0.4], augment=true, pool=data)
4-element UnivariateFiniteArray{Multiclass{3}, String, UInt32, Float64, 1}:
 UnivariateFinite{Multiclass{3}}(no=>0.9, yes=>0.1)
 UnivariateFinite{Multiclass{3}}(no=>0.8, yes=>0.2)
 UnivariateFinite{Multiclass{3}}(no=>0.7, yes=>0.3)
 UnivariateFinite{Multiclass{3}}(no=>0.6, yes=>0.4)

julia> pdf.(v, "no")
4-element Vector{Float64}:
 0.9
 0.8
 0.7
 0.6

Query the UnivariateFinite doc-string for advanced constructor options.

A (non-standard) implementation of pdf allows for extraction of the full probability array:

julia> L = levels(data)
3-element Vector{String}:
 "maybe"
 "no"
 "yes"

julia> pdf(v, L)
4×3 Matrix{Float64}:
 0.0  0.9  0.1
 0.0  0.8  0.2
 0.0  0.7  0.3
 0.0  0.6  0.4

Measures over finite labeled sets

There is, in fact, no enforcement that probabilities in a UnivariateFinite distribution sum to one, only that they be belong to a type T for which zero(T) is defined. In particular UnivariateFinite objects implement arbitrary non-negative, signed, or complex measures over a finite labeled set.

What does this package provide?

  • A new type UnivariateFinite{S} for representing probability distributions over the pool of a CategoricalArray, that is, over finite labeled sets. Here S is a subtype of OrderedFactor from ScientificTypesBase.jl, if the pool is ordered, or of Multiclass if the pool is unordered.

  • A new array type UnivariateFiniteArray{S} <: AbstractArray{<:UnivariateFinite{S}} for efficiently manipulating arrays of UnivariateFinite distributions.

  • Implementations of rand for generating random samples of a UnivariateFinite distribution.

  • Implementations of the pdf, logpdf and mode methods of Distributions.jl, with efficient broadcasting over the new array type.

  • Implementation of Distributions.fit from Distributions.jl for UnivariateFinite distributions.

  • A single constructor for constructing UnivariateFinite distributions and arrays thereof, from arrays of probabilities.

Acknowledgements

The initial release of this package is based almost entirely on code originally residing in MLJBase.jl with contributions from Anthony Blaom, Thibaut Lienart, Samuel Okon, and Chad Scherrer. These contributions are not reflected in the current repository's commit history.

categoricaldistributions.jl's People

Contributors

ablaom avatar okonsamuel avatar davnn avatar zsz00 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.