sisl / bayesnets.jl Goto Github PK

View Code? Open in Web Editor NEW

218.0 29.0 48.0 836 KB

Bayesian Networks for Julia

License: Other

Julia 100.00%

bayesnets.jl's Introduction

BayesNets

This library supports representation, inference, and learning in Bayesian networks.

Please read the documentation.

bayesnets.jl's People

Contributors

Stargazers

Watchers

Forkers

physicsd00d bakkot schmrlng ashwincarvalho jbn tkelman renatokano abrahambotros ratanrsur grwqk6 ermueller2000 tcfuji tranquilhero smostars zsunberg shangy juliohm nivetaiyer zswaff anthonyperez shadovvmoon gregramel henripal jpfairbanks sctchoi memoiry kmsquire rohanluigi brioglade cchderrick nebnebgnahz stjordanis wieke rjsteckel nogtini petrposik mowilliams hgraf dvdzhang dwijenchawra standardgalactic chuanqichen playfloor mfkiwl pitmonticone km4573 karthik-avj wswzhh

bayesnets.jl's Issues

a call for Markov Logic Network in Julia

Hi Bayesians. I just stopped by to check out your BN as it probably is the closest net to the MLN I am getting ready to build http://i.stanford.edu/hazy/tuffy/doc/tuffy-manual.pdf.

Anybody interested in joining is welcome. A BN compactly represents a probability distribution but doesn't allow for efficient reasoning, doesn't return the full table of the sum of all probs. of all states. Constructing it from individual tables is NP hard. Computing P(effect | cause 1, not cause 2) without the full table is the inference problem in BNs. Also, BNs in effect have invisible arrows together with visible ones. BNs return better results by having their probs. changed in ways that distance from Bayes' Theorem, due to false independence assumptions that generative models make, e.g. Naive Bayes can make the best predictions even in cases where its independence assumptions are violated. You can continue reading about this on Pedro Domingos' book, page 170, or check out a more complete argument which also involves founder Minsky from CSAIL MIT on my Julia Users post 'is the master algo on the roadmap?'

Here are some examples of what MLN can do

MLN also improves MCMC's slow and false convergence problems.

Further work being done by Domingos on symmetry-based learning to be used with MLNs.

Best, hpoit

Converting BNs to factor graphs or MRFs

Hi all!
I've been working with the library for the past couple days and am finding it very useful!

A feature that I would love seeing is the ability to convert easily between Bayesian Network representations to factor graphs or Markov Random Fields. The conversions are not computationally intensive and can sometimes allow to solve certain problems more elegantly (for ex LBP on MRFs only has one type of message rather than Pi and Lambda messages).

Let me know if that is of interest to you and I would happily start working on a PR

Inference on continuous variables

How to do inference on continuous variables? The document only mention discrete variable examples.

Make it clear what Assignments are and that CPDs should only use their parents

Hopefully just a doc update

Likelihood weighted inference does not return a table over all possible assignments

Running infer with a LikelihoodWeightingInference when some assignments have zero (or very low) likelihood will cause them to not show up in the samples. In these cases, the DataFrame returned will not have rows for those assignments.

It would be good to have a complete DataFrame factor, with entries for each assignment, even if the probability is zero.

Towards BayesNets 2.0

We dropped support for Julia 0.4, so the next release will need to increment the major tick. Thus, BayesNets v2.0.0

Before I tag a new release I would like to:

merge in PR #42
merge in PR #43
verify docs are up to date
make sure all samplers use AbstractSampler
resolve problem of having two Gibbs samplers
make inference more consistent, probably with an AbstractInference type and associated function
resolve #49
resolve #48

Does anyone have any additional issues to post? Things we need to clear up before moving forward?

Learning tables from known structure and data

At the moment, does the package support learning probabilities given data? Right now I'm using gRain/gR/gRim for R, but it doesn't have support for vertices with normal distributions, which would make my life easier.

I have a mixed net with a known structure with mostly discrete nodes and a single continuous/gaussian distributed one. (Specifically, it's a map of binary features to a utility value estimate). gRain has a compile function which will take a data frame and generate the CPDs given a known graph structure (and, since this package has multiple types of node, presumably priors as well).

It doesn't look like anything of the sort exists yet in BayesNets.jl, but I wanted to make sure and ask. It looks like it's all just network score/structure learning right now? I'm not sure it's an easy addition, especially with mixed nodes. It looks like most existing packages ensure all nodes have the same distribution.

Change `tables` to use :potential instead of :p

The DataFrames used as factors have a :p column. P is fairly likely to be a variable name.
It might be better to use :potential instead.

Thoughts?

LoadError: UndefVarError: RejectionSampler not defined

Hi Tim. I got this

Base.zero(::Any) may be unneeded

Base.zero(::Any) = "" is declared in BayesNets.jl's main file. Nothing ever calls its. Do we still need it?

Factors

@hamzaelsaawy is currently rewriting all the inference code, starting with a Factor datatype (https://github.com/hamzaelsaawy/Factors.jl) that uses multidimensional arrays and not DataFrames for (ideally) better lookup and more efficient storage. He plans on making likelihood weighting and Gibbs restartable, so the user can continue or seed the algorithms.

Joins are significantly more costly (Julia's broadcast! is to blame?), but hopefully overall, the api can be standardized and significantly reduce the overhead of inference. (Especially for loopy belief propagation.)

It would be great if it would automatically render as a table like a DataFrame does in both the terminal and julia notebooks. Other than that, if it works and is faster it'd be great to have.

Some questions for discussion:

Would there be a benefit to having a standard interface for Factor types so we can keep DataFrames?

Ordering of distributions in CategoricalCPD

Hey @tawheeler, this was a question from piazza:

If the parents of A are X, Y and Z, then is the ordering for the distributions in CategoricalCPD:

X,Y,Z
0,0,0
1,0,0
0,1,0
1,1,0
0,0,1
1,0,1
⋮

I wasn't 100% sure what the ordering is. Do you think you could update the documentation with notation similar to theirs? Thanks!

DiscreteBayesNet - CPD order and named categories

I am starting a project using discrete Bayes Nets and I am hoping to use this package, rather than one of the python packages (eg pomegranate or pgmpy). But I have a few questions...

I managed to get a simple example working (made up numbers), with

bn = DiscreteBayesNet()
push!(bn, DiscreteCPD(:smoke, [0.25,0.75]))
push!(bn, DiscreteCPD(:covid, [0.1,0.9]))
push!(bn, DiscreteCPD(:hospital, [:smoke, :covid], [2,2], 
        [Categorical([0.9,0.1]),
         Categorical([0.2,0.8]),
         Categorical([0.7,0.3]),
         Categorical([0.01,0.99]),
        ]))

Could I request a little more information in the documentation about the way in which this CPD is coded? Eventually I was able to figure it out by asking for the CPD table with table(bn, :hospital)

but I'd definitely say more documentation here would help people out.

Secondly, is there a way to name the levels that each node can take? From the docs I attempted a guess at something like this...

bn = DiscreteBayesNet()
push!(bn, DiscreteCPD(:smoke, NamedCategorical([:yes, :no], [0.25, 0.75])))
push!(bn, DiscreteCPD(:covid, NamedCategorical([:yes, :no], [0.1, 0.9])))
push!(bn, DiscreteCPD(:hospital, [:smoke, :covid], [2,2], 
        [NamedCategorical([:yes, :no], [0.9, 0.1]),
         NamedCategorical([:yes, :no], [0.2, 0.8]),
         NamedCategorical([:yes, :no], [0.7, 0.3]),
         NamedCategorical([:yes, :no], [0.01, 0.99])]))

but no luck. Any pointers or updates to the docs on this would be very much appreciated.
Ben

[PkgEval] BayesNets may have a testing issue on Julia 0.3 (2014-08-22)

PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.3) and the nightly build of the unstable version (0.4). The results of this script are used to generate a package listing enhanced with testing results.

On Julia 0.3

On 2014-08-22 the testing status was N/A - new package.
On 2014-08-22 the testing status changed to Package doesn't load.

Package doesn't load. means that PackageEvaluator did not find tests for your package. Additionally, trying to load your package with using failed.

This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.

Test log:

>>> 'Pkg.add("BayesNets")' log
INFO: Cloning cache of BayesNets from git://github.com/sisl/BayesNets.jl.git
INFO: Cloning cache of DataArrays from git://github.com/JuliaStats/DataArrays.jl.git
INFO: Cloning cache of DataFrames from git://github.com/JuliaStats/DataFrames.jl.git
INFO: Cloning cache of DataStructures from git://github.com/JuliaLang/DataStructures.jl.git
INFO: Cloning cache of GZip from git://github.com/JuliaLang/GZip.jl.git
INFO: Cloning cache of Graphs from git://github.com/JuliaLang/Graphs.jl.git
INFO: Cloning cache of Reexport from git://github.com/simonster/Reexport.jl.git
INFO: Cloning cache of SortingAlgorithms from git://github.com/JuliaLang/SortingAlgorithms.jl.git
INFO: Cloning cache of StatsBase from git://github.com/JuliaStats/StatsBase.jl.git
INFO: Installing ArrayViews v0.4.6
INFO: Installing BayesNets v0.0.1
INFO: Installing DataArrays v0.2.0
INFO: Installing DataFrames v0.5.7
INFO: Installing DataStructures v0.3.1
INFO: Installing GZip v0.2.13
INFO: Installing Graphs v0.4.3
INFO: Installing Reexport v0.0.1
INFO: Installing SortingAlgorithms v0.0.1
INFO: Installing StatsBase v0.6.3
INFO: Package database updated

>>> 'using BayesNets' log
Julia Version 0.3.0-rc4+4
Commit bab9636 (2014-08-20 18:47 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

ERROR: TikzGraphs not found
 in require at loading.jl:47
 in include at ./boot.jl:245
 in include_from_node1 at ./loading.jl:128
 in reload_path at loading.jl:152
 in _require at loading.jl:67
 in require at loading.jl:51
 in include at ./boot.jl:245
 in include_from_node1 at loading.jl:128
 in process_options at ./client.jl:285
 in _start at ./client.jl:354
 in _start_3B_1718 at /home/idunning/julia03/usr/bin/../lib/julia/sys.so
while loading /home/idunning/pkgtest/.julia/v0.3/BayesNets/src/BayesNets.jl, in expression starting on line 7
while loading /home/idunning/pkgtest/.julia/v0.3/BayesNets/testusing.jl, in expression starting on line 2

>>> test log
no tests to run
>>> end of log

Prevent users from assigning invalid CPDs

It is possible for a user to assign an invalid CPD to a BN. For example, in a network A->B, we may want to check / prevent a CPD being assigned to A that depends on B.

Parameter learning takes forever

I am trying to fit the parameters of a not-so-big discrete BN. I have tested it in Netica and the learning happens instantly. But with BayesNets.jl it took 30 hours before running out of memory. Is this a bug or a limitation?

Here is the network structure and 200 samples to replicate the issue.

using BayesNets
using CSV
df = CSV.read("sample.csv");
dbn = fit(DiscreteBayesNet, df,
  (
    :A => :S,
    :B => :S,
    :C => :S,
    :D => :S,
    :E => :S,
    
    :F => :SN,
    :G => :SN,
    :H => :SN,

    :SN => :DCN,
    :SN => :CAN,
    :SN => :VCN,

    :S => :EA,
    :A => :EA,
    :B => :EA,
    :C => :EA,
    :D => :EA,
    :E => :EA,
    
    :EA => :ARS,
    :S => :ARS,
    :SN => :ARS,
    :A => :ARS,
    :B => :ARS,
    :C => :ARS,
    :D => :ARS,
    :E => :ARS,
    :F => :ARS,
    :G => :ARS,
    :H => :ARS,
    
    :EA => :AR,
    :S => :AR,
    :SN => :AR,
    :A => :AR,
    :B => :AR,
    :C => :AR,
    :D => :AR,
    :E => :AR,
    :F => :AR,
    :G => :AR,
    :H => :AR,
    
    :AR => :CA,
    :S => :CA,
    :SN => :CA,
    :A => :CA,
    :B => :CA,
    :C => :CA,
    :D => :CA,
    :E => :CA,
    :F => :CA,
    :G => :CA,
    :H => :CA,
    :CAN => :CA,
    
    :AR => :VC,
    :S => :VC,
    :SN => :VC,
    :A => :VC,
    :B => :VC,
    :C => :VC,
    :D => :VC,
    :E => :VC,
    :F => :VC,
    :G => :VC,
    :H => :VC,
    :VCN => :VC,
    
    :AR => :ER,
    :S => :ER,
    :SN => :ER,
    :A => :ER,
    :B => :ER,
    :C => :ER,
    :D => :ER,
    :E => :ER,
    :F => :ER,
    :G => :ER,
    :H => :ER,
    
    :AR => :IN,
    :S => :IN,
    :SN => :IN,
    :A => :IN,
    :B => :IN,
    :C => :IN,
    :D => :IN,
    :E => :IN,
    :F => :IN,
    :G => :IN,
    :H => :IN,
    
    :CA => :DC,
    :S => :DC,
    :SN => :DC,
    :A => :DC,
    :B => :DC,
    :C => :DC,
    :D => :DC,
    :E => :DC,
    :F => :DC,
    :G => :DC,
    :H => :DC,
    :DCN => :DC,
  )
)

Cannot add methods to abstract type - moving to 0.5, Julia issue 14919

It looks like Base.call is no longer supported for abstract types in Julia 0.5. See issue 14919.

At the time of this writing we use call for CPDs to allow you to condition on them. ie, cpd(assignemnt) -> pdf. In 0.5 we may have to create a macro that does this for you that you call after defining your cpd, or stop using call and define a function with a name, maybe condition.

condition(cpd, assignment) or similar.

Move to Documenter?

Documenter.jl is used by POMDPs.jl and many other projects for creating their documentation.
BayesNets.jl seems to be outgrowing its one-jnotebook doc.

Error in Julia v0.5

The current tag triggers an error:

bn = DiscreteBayesNet()
push!(bn, DiscreteCPD(:B, [0.1,0.9]))
push!(bn, DiscreteCPD(:S, [0.5,0.5]))
push!(bn, rand_cpd(bn, 2, :E, [:B, :S]))
push!(bn, rand_cpd(bn, 2, :D, [:E]))
push!(bn, rand_cpd(bn, 2, :C, [:E]))

LoadError: MethodError: Cannot convert an object of type BayesNets.CPDs.CategoricalCPD{Distributions.Categorical{Float64}} to an object of type BayesNets.CPDs.CategoricalCPD{Distributions.Categorical}
This may have arisen from a call to the constructor BayesNets.CPDs.CategoricalCPD{Distributions.Categorical}(...),
since type constructors fall back to convert methods.
while loading In[2], in expression starting on line 2

in push!(::Array{BayesNets.CPDs.CategoricalCPD{Distributions.Categorical},1}, ::BayesNets.CPDs.CategoricalCPD{Distributions.Categorical{Float64}}) at ./array.jl:479
in push!(::BayesNets.BayesNet{BayesNets.CPDs.CategoricalCPD{Distributions.Categorical}}, ::BayesNets.CPDs.CategoricalCPD{Distributions.Categorical{Float64}}) at /home/juliohm/.julia/v0.5/BayesNets/src/bayes_nets.jl:138

Testing Issues

Hi,
I'm not a Julia expert so I'm not sure how to do testing in Julia. I added a removeEdges! function to my fork but run_tests.jl can't find it:

$ julia test/runtests.jl
ERROR: removeEdges! not defined
 in include at /usr/local/Cellar/julia/0.3.2/lib/julia/sys.dylib
 in include_from_node1 at loading.jl:128
 in process_options at /usr/local/Cellar/julia/0.3.2/lib/julia/sys.dylib
 in _start at /usr/local/Cellar/julia/0.3.2/lib/julia/sys.dylib (repeats 2 times)

Any suggestions? I would like to add some passing tests before creating PR.

Also, how come there is a travis file but the icon is not in the README.md file?

Warning messages from DataFrames.jl

┌ Warning: `by(d::AbstractDataFrame, cols::Any, f::Base.Callable; sort::Bool=false, skipmissing::Bool=false)` is deprecated, use `combine(f, groupby(d, cols, sort=sort, skipmissing=skipmissing))` instead.
│   caller = sumout(::Table, ::Symbol) at tables.jl:69
└ @ BayesNets ~/build/sisl/BayesNets.jl/src/DiscreteBayesNet/tables.jl:69
┌ Warning: inner joining data frames using join is deprecated, use `innerjoin(df1, df2, on=Symbol[:N5, :N3], makeunique=false, validate=(false, false))` instead
│   caller = ip:0x0
└ @ Core :-1

Should be easy to fix.

table() cannot handle variables with differing domain types

Calling ndgrid will fail if network variables have different domain types.

CategoricalCPD definition was changed such that distributions is a multidimensional array

It looks like the definition of CategoricalCPD was changed in 065a6bc from having a flat list of distributions to have a multidimensional array such that indexing it is more convenient. This is not compatible with the current definition of fit, and build does not pass.

BNs cannot be saved using HDF5 / JLD

BayesNets currently implements all CPDs as functions. If you try to save a BN using HDF5/JLD it will fail, as functions contain pointers which cannot be saved to a file.
Recommend creating additional CPD types that internally use dicts or something similar instead.

pushing to metadata

@tawheeler , when I fix a small bug, as in f24c7e9 , should I go ahead and tag and push to metadata, or should I just notify you and have you do it?

Rendering BayesNets when you don't have LaTeX

This is similar to #53, but a focus on a temporary workaround.

We would like BNs to be rendered as text if you do not have LaTeX. The stack is as follows:

BayesNets: show(f::IO, a::MIME"image/svg+xml", bn::BayesNet)
TikzGraphs: plot(g::SimpleGraph)
TikzPictures: show(f::IO, ::MIME"image/svg+xml", tp::TikzPicture)

Our options are:

Modify show in BayesNets to produce text. This is a little weird since it is in a MIME"image/svg+xml". We could conditionally generate the show method but that is intuitive
Modify plot in TikzGraphs. This is probably a bad idea because it is supposed to generate a TikzPicture
Modify show in TikzPictures. It makes most sense to have TikzPictures reason about whether lualatex is installed or not, but when passed a TikzPicture its job is just to render it, not determine whether it is a graph and print it out.

I am leaning towards modifying BayesNets.

Adding constraints to structure learning

Is it possible to add constraints to structure learning? so that the user can pre-exclude some connections based on expert knowledge. That would be a functionality similar to tabu_edges in CausalNex.

Error in one of the examples

data = DataFrame(c=[1,1,1,1,2,2,2,2,3,3,3,3], 
                 b=[1,1,1,2,2,2,2,1,1,2,1,1],
                 a=[1,1,1,2,1,1,2,1,1,2,1,1])

fit(DiscreteBayesNet, data, (:a=>:b, :a=>:c, :b=>:c))

ERROR: MethodError: Cannot `convert` an object of type Nothing to an object of type Int64
Closest candidates are:
  convert(::Type{T}, ::T) where T<:Number at number.jl:6
  convert(::Type{T}, ::Number) where T<:Number at number.jl:7
  convert(::Type{T}, ::Ptr) where T<:Integer at pointer.jl:23
  ...
Stacktrace:
 [1] LightGraphs.SimpleGraphs.SimpleEdge{Int64}(::Nothing, ::Nothing) at /home/alirv/.julia/packages/LightGraphs/siFgP/src/SimpleGraphs/simpleedge.jl:7
 [2] add_edge!(::LightGraphs.SimpleGraphs.SimpleDiGraph{Int64}, ::Nothing, ::Nothing) at /home/alirv/.julia/packages/LightGraphs/siFgP/src/SimpleGraphs/SimpleGraphs.jl:90
 [3] _get_dag(::DataFrame, ::Tuple{Pair{Symbol,Symbol},Pair{Symbol,Symbol},Pair{Symbol,Symbol}}) at /home/alirv/.julia/packages/BayesNets/XwVs4/src/learning.jl:34
 [4] fit(::Type{BayesNet{CategoricalCPD{DiscreteNonParametric{Int64,Float64,Base.OneTo{Int64},Array{Float64,1}}}}}, ::DataFrame, ::Tuple{Pair{Symbol,Symbol},Pair{Symbol,Symbol},Pair{Symbol,Symbol}}) at /home/alirv/.julia/packages/BayesNets/XwVs4/src/learning.jl:47
 [5] top-level scope at REPL[5]:1

Should rand, count, etc. use DataFrames directly?

We have changed some of the code to use Table, which wraps DataFrame. It has the field potential. Should tables be interpreted solely as representations of potentials (in this case probability distributions)? Or should we have one Table type that holds all kinds of tables, e.g., samples, counts, weighted samples, conditional probability tables, etc.? And, if so, should we rename potential to something else (perhaps, df)?

I'd especially like to know what @tawheeler and @hamzaelsaawy think. Let me know and I'll implement it over the next few days.

Sampling with FunctionalCPD

Thanks for developing this great package. As a LightGraphs maintainer, I am happy to see this great application using LightGraphs.

I was building some networks that use the FunctionalCPD because I would like to have nodes similar to the LinearGaussianCPD but where the output is a binary variable.

I made a FunctionalCPD where the output is a Bernoulli variable with a mean dependent on the parent values of the node.

I have a minimal replicating example.

Pkg.checkout("BayesNets")
using BayesNets
using LightGraphs

a = StaticCPD(:a, Bernoulli(0.5))
b = StaticCPD(:b, Bernoulli(0.6))
c = FunctionalCPD{Bernoulli}(:c, [:a,:b], seq->Bernoulli(mean(values(seq))))
bn = BayesNet()
push!(bn, a)
push!(bn, b)
push!(bn, c)
rand(bn, 20, :a=>1, :b=>1)

Which I expect to return a table where every row is (1,1,1), however I see

Is this a missapplication of the FunctionalCPD type? Or something wrong with the sampling process? Or something else?

Warnings when working with DataFrames

b = DiscreteBayesNet()
push!(b, DiscreteCPD(:A, [0.5,0.5]))
push!(b, DiscreteCPD(:B, [:A], [2], [Categorical([0.5,0.5]), Categorical([0.45,0.55])]))
push!(b, CategoricalCPD(:C, Categorical([0.5,0.5])))
d = rand(b, 5)

results in

┌ Warning: Indexing with colon as row will create a copy in the future. Use `df[col_inds]` to get the columns without copying
│   caller = count(::BayesNet{CategoricalCPD{Categorical{Float64}}}, ::Symbol, ::DataFrame) at discrete_bayes_net.jl:109
└ @ BayesNets C:\Users\mykel\.julia\packages\BayesNets\B5P8k\src\DiscreteBayesNet\discrete_bayes_net.jl:109
┌ Warning: Selecting a single row from a `DataFrame` will return a `DataFrameRow` in the future. To get a `DataFrame` use `df[row_ind:row_ind, :]`.
│   caller = count(::BayesNet{CategoricalCPD{Categorical{Float64}}}, ::Symbol, ::DataFrame) at discrete_bayes_net.jl:113
└ @ BayesNets C:\Users\mykel\.julia\packages\BayesNets\B5P8k\src\DiscreteBayesNet\discrete_bayes_net.jl:113
┌ Warning: Selecting a single row from a `DataFrame` will return a `DataFrameRow` in the future. To get a `DataFrame` use `df[row_ind:row_ind, :]`.
│   caller = count(::BayesNet{CategoricalCPD{Categorical{Float64}}}, ::Symbol, ::DataFrame) at discrete_bayes_net.jl:113
└ @ BayesNets C:\Users\mykel\.julia\packages\BayesNets\B5P8k\src\DiscreteBayesNet\discrete_bayes_net.jl:113

Use package-local wrapper type instead of extending Base (or other packages') methods on DataFrame

Since https://github.com/sisl/BayesNets.jl/blob/5febf756d38194712cfb8c0c2294252931611afa/src/DiscreteBayesNet/tables.jl does a const Table = DataFrame, all method extensions on Table are global and change the behavior of other unrelated code that uses DataFrames, just from importing BayesNets whether or not it's using this package. This is discouraged.

If Table were instead a wrapper struct that delegated the relevant operations to a single DataFrame field, then you could define methods on BayesNets.Table however you like and not be at risk of changing any other code's behavior.

Same issue with

BayesNets.jl/src/ProbabilisticGraphicalModels/assignments.jl

Line 7 in 5febf75

Base.names(a::Assignment) = collect(keys(a))

and

BayesNets.jl/src/ProbabilisticGraphicalModels/nodenames.jl

Line 5 in 5febf75

Base.convert(::Type{NodeNames}, name::NodeName) = [name]

as well

and 3cd56dc#commitcomment-22494521 should probably be submitted upstream in DataFrames?

Add Julia 0.5 testing

Update .travis.yml similar to this.

Julia 1.0 support

Any plans to move to Julia 1.x ?

Documentation is Bad

I'm a Computer Science student and couldn't figure out how to use your library to save my life.
Usually, libraries have a listing of the functions they contain. One helpful listing for your library would include "function setCPD(): takes some parameters, does something with them" although I do not see any sort of documentation for the functions linked to this page whatsoever. The documentation also does not include an example of creating a static Bayesian Network (like the kind my professor is having me recreate for a project). If you could include this sort of example, it would be very helpful, and your library would be oh so much more useful.
Update: even the source code is uncommented. Is everyone working on this project a self-taught programmer?

[PkgEval] BayesNets may have a testing issue on Julia 0.4 (2014-09-29)

On Julia 0.4

On 2014-09-28 the testing status was Tests pass.
On 2014-09-29 the testing status changed to Package doesn't load.

Tests pass. means that PackageEvaluator found the tests for your package, executed them, and they all passed.

Package doesn't load. means that PackageEvaluator did not find tests for your package. Additionally, trying to load your package with using failed.

This error on Julia 0.4 is possibly due to recently merged pull request JuliaLang/julia#8420.
This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.

Test log:

>>> 'Pkg.add("BayesNets")' log
INFO: Installing ArrayViews v0.4.6
INFO: Installing BayesNets v0.0.3
INFO: Installing DataArrays v0.2.1
INFO: Installing DataFrames v0.5.8
INFO: Installing DataStructures v0.3.2
INFO: Installing GZip v0.2.13
INFO: Installing Graphs v0.4.3
INFO: Installing LaTeXStrings v0.1.0
INFO: Installing Reexport v0.0.1
INFO: Installing SortingAlgorithms v0.0.1
INFO: Installing StatsBase v0.6.5
INFO: Installing TikzGraphs v0.0.1
INFO: Installing TikzPictures v0.1.2
INFO: Package database updated

>>> 'using BayesNets' log
ERROR: InexactError()
 in uint32_3B_2324 at /home/idunning/julia04/usr/bin/../lib/julia/sys.so
 in include at ./boot.jl:245
 in include_from_node1 at ./loading.jl:128
 in reload_path at loading.jl:152
 in _require at loading.jl:67
 in require at loading.jl:54
 in include at ./boot.jl:245
 in include_from_node1 at ./loading.jl:128
 in reload_path at loading.jl:152
 in _require at loading.jl:67
 in require at loading.jl:54
 in include at ./boot.jl:245
 in include_from_node1 at ./loading.jl:128
 in reload_path at loading.jl:152
 in _require at loading.jl:67
 in require at loading.jl:51
 in include at ./boot.jl:245
 in include_from_node1 at loading.jl:128
 in process_options at ./client.jl:285
 in _start at ./client.jl:354
 in _start_3B_3625 at /home/idunning/julia04/usr/bin/../lib/julia/sys.so
while loading /home/idunning/pkgtest/.julia/v0.4/TikzPictures/src/TikzPictures.jl, in expression starting on line 186
while loading /home/idunning/pkgtest/.julia/v0.4/TikzGraphs/src/TikzGraphs.jl, in expression starting on line 11
while loading /home/idunning/pkgtest/.julia/v0.4/BayesNets/src/BayesNets.jl, in expression starting on line 7
while loading /home/idunning/pkgtest/.julia/v0.4/BayesNets/testusing.jl, in expression starting on line 2
Julia Version 0.4.0-dev+842
Commit e5d8c1a (2014-09-29 06:50 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3


>>> test log
no tests to run
>>> end of log

Is it possible to move `rand_table` from rejection sampling to sampling where known values are assigned?

@ermueller2000 brought this to my attention.
The current implementation of rand_table uses rejection sampling, and passing numSamples gives the number of attempts, not the number of samples you want to end up with.
Alternatively we can set known values during the sampling procedure to speeds things up. The drawback is that we need a way to ensure that the probability of what we end up with is non-zero. This way we always end up with numSamples rows.

TeX-free rendering

writememe currently uses TikzGraphs to render to LaTeX. It would be nice to have a rendering method without so many dependencies. Maybe something that works with Cairo.jl.

Julia Version Compatibility

The current Project.toml has an error that says that it supports julia >= 0.7. This is incorrect. Could you please update it in different versions to reflect to actually supported versions? Version 3.0.0 for example works on Julia v1.0, but 3.1.0 only works on Julia 1.1.

This compatibility error in the Project.toml causes the registry for Julia 1.0 to pull the wrong version and BayesNets doesn't actually work anymore on Julia 1.0.

HEADS UP: Breaking change in LightGraphs API

Sorry if this is not relevant to your use of LightGraphs, but I wanted to make sure you’re aware of a change in the API for the induced_subgraph() function.

Starting in LightGraphs 0.7.1, induced_subgraph() will, in addition to the subgraph itself, return a mapping of the original vertex indices to the new ones. This will require code changes to ignore the new return value if you’re using this function.

If you’re using the getindex version of induced_subgraph (that is, g[1:5]), there will be no change.

Feel free to close this out if it’s not applicable to your use of LightGraphs. Thanks!

Loopy Belief inference is only implemented for a single node.

I introduced a new inference API:

abstract InferenceMethod

"""
Infer p(query|evidence)
 - inference on a DiscreteBayesNet will always return a DataFrame factor over the evidence variables
"""
infer(im::InferenceMethod, bn::BayesNet, query::Vector{NodeName}; evidence::Assignment=Assignment())

Likelihood and exact inference work perfectly under this.
Loopy belief, however, only takes a query::NodeName, when a Vector{NodeName} would be ideal.

heads-up 0.6 changes to LightGraphs

Hi all,

LightGraphs is preparing for a fairly large upgrade with Julia 0.6, and I wanted to make you aware of what we're doing. Very few things are breaking, but there will be some potential performance improvements that you might be able to take advantage of.

In a nutshell: we're abstracting Graph and DiGraph, renaming them to SimpleGraph and SimpleDiGraph*, and parameterizing vertex indices to <: Integer. This will allow you to create, e.g., SimpleGraph{UInt8}s that will be much more space-efficient if you have graphs that are < 256 vertices.

*While we plan on more abstractions (e.g., for weighted graphs) in the future, for now Graph and DiGraph will continue to work and will continue to default to SimpleGraph, and the default parameterization for both Simple(Di)Graphs and (Di)Graphs will be Int.

Feel free to review the (very long) WIP at sbromberger/LightGraphs.jl#541 - your feedback would be greatly appreciated.

BayesNet does not support unicode characters

d = BayesNet([:a, :ux, :wx, :xk, :xkp])

works ok, but

d = BayesNet([:α, :ux, :wx, :xk, :xkp])

produces an error:

invalid ASCII sequence
 in setindex! at array.jl:307
 in writemime at /home/zouhair/.julia/v0.3/BayesNets/src/BayesNets.jl:102
 in sprint at iostream.jl:229
 in display_dict at /home/zouhair/.julia/v0.3/IJulia/src/execute_request.jl:31

Since Julia supports unicode, one would expect to be able to use them as graph nodes.

Seems like the issue is actually with TikzGraphs which only handles ASCIIstrings:

function plotHelper(g::GenericGraph, libraryname::String, layoutname::String, options::String, labels::AbstractArray{ASCIIString,1})

Is there a UTF8strings to "latex" symbol?

Error in fit examples of the documentation

Hi,
I'm discovering this great package. And running the examples given in the notebook of the documentation I face an unexpected error:

the three lines of the notebook (In[17], In[18], In[19]) when running
fit(BayesNet, data, (:a=>:b), [StaticCPD{Normal}, LinearGaussianCPD])
or
fit(BayesNet, data, (:a=>:b), LinearGaussianCPD)
or

data = DataFrame(c=[1,1,1,1,2,2,2,2,3,3,3,3], 
                 b=[1,1,1,2,2,2,2,1,1,2,1,1],
                 a=[1,1,1,2,1,1,2,1,1,2,1,1])

fit(DiscreteBayesNet, data, (:a=>:b, :a=>:c, :b=>:c))

give the same error:

MethodError: Cannot convert an object of type Nothing to an object of type Int64
Closest candidates are:
convert(::Type{T}, !Matched::T) where T<:Number at number.jl:6
convert(::Type{T}, !Matched::Number) where T<:Number at number.jl:7
convert(::Type{T}, !Matched::Ptr) where T<:Integer at pointer.jl:23
...
LightGraphs.SimpleGraphs.SimpleEdge{Int64}(::Nothing, ::Nothing) at simpleedge.jl:7
add_edge!(::LightGraphs.SimpleGraphs.SimpleDiGraph{Int64}, ::Nothing, ::Nothing) at SimpleGraphs.jl:90
_get_dag(::DataFrame, ::Tuple{Pair{Symbol,Symbol}}) at learning.jl:34
fit(::Type{BayesNet}, ::DataFrame, ::Pair{Symbol,Symbol}, ::Array{DataType,1}) at learning.jl:59
top-level scope at BayesNets.jl:44

I am with Windows, Atom, Julia 1.4.2.

Is it only happening for me?
Thanks if you can help.

StaticCPD is confusing

It seems like StaticCPD is confusing users.

A StaticCPD is just a CPD which always returns the same distribution, no matter what the parent values are. I did not want to call it ParentlessCPD because it can technically have parents, they just don't do anything.

Some possible ways to rename it are IndependentCPD or ParentAgnosticCPD. Neither of these sound great.

Does anyone have a better name?

FunctionalCPD argument contains self

As a follow up to #58, I see that the FunctionalCPD function is passed the assignment of all the values, including value of the current node.

type FunctionalCPD{D} <: CPD{D}
    target::NodeName
    parents::NodeNames
    accessor::Function # calling this gives you the distribution from the assignment

    FunctionalCPD(target::NodeName, accessor::Function) = new(target, NodeName[], accessor)
    FunctionalCPD(target::NodeName, parents::NodeNames, accessor::Function) = new(target, parents, accessor)
end

name(cpd::FunctionalCPD) = cpd.target
parents(cpd::FunctionalCPD) = cpd.parents
@define_call FunctionalCPD
@compat (cpd::FunctionalCPD)(a::Assignment) = cpd.accessor(a)

But the LinearGaussianCPD iterates over just the parents of the node.

type LinearGaussianCPD <: CPD{Normal}
    target::NodeName
    parents::NodeNames

	a::Vector{Float64}
	b::Float64
    σ::Float64
end
LinearGaussianCPD(target::NodeName, μ::Float64, σ::Float64) = LinearGaussianCPD(target, NodeName[], Float64[], μ, σ)

name(cpd::LinearGaussianCPD) = cpd.target
parents(cpd::LinearGaussianCPD) = cpd.parents
nparams(cpd::LinearGaussianCPD) = length(cpd.a) + 2
@define_call LinearGaussianCPD
@compat function (cpd::LinearGaussianCPD)(a::Assignment)

    # compute A⋅v + b
    μ = cpd.b
    for (i, p) in enumerate(cpd.parents)
        μ += a[p]*cpd.a[i]
    end

    Normal(μ, cpd.σ)
end

Should the FunctionalCPD have a way to distinguish its parents from the rest of the variables in the assignment?

I am trying to have a lot of nodes that all use the same function for the FunctionalCPD, but that function depends on the parents and not the current value.

I can't encode the names of the parents into the Function passed to the FunctionalCPD, because that would require 1 Function object per node.

I guess the officially recommended way to do this is implement my own type <: CPD{D}.

Structure Learning

Hi all,

In preparing for AA228 Project 1, I have started to realize that BayesNets.jl 1.0 is not very well-suited to structure learning using the Bayesian Score as a metric. This is for two reasons.

There appears to be no API Mechanism for adding or removing edges
The bayesian score calculation has nothing to do with the CPDs in the BayesNet. The only thing that matters is the data. I found this very confusing when I took the class.

Do we want to add some tools to facilitate structure learning? Or should we just recommend that students write their own? How much turnaround time is there to get changes pushed to the official version on METADATA?

The simplest thing we could do is simply add a more convenient bayesian_score method:

bayesian_score(g::LightGraphs.DiGraph, node_to_name::Vector{Symbol}, data::DataFrame)

Or we could add a type for structure learning

type BayesNetStructure
    g::DiGraph
    names::Vector{Symbol}
    _name_to_node::Nullable{Dict{Symbol, Int}} # this is of course redundant, not sure if it would be useful to have for performance
end

What do you think? Should we just let students create their own tools? maybe just put the bayesian_score method as an example in the course materials

Codecov is not Updating

The last update of Codecov is from Sept 9th, 2018:

sumout columns with multiple values fails

b = DiscreteBayesNet()
push!(b, DiscreteCPD(:B, [0.1,0.9]))
push!(b, DiscreteCPD(:S, [0.5,0.5]))
push!(b, rand_cpd(b, 3, :E, [:B, :S]))
push!(b, rand_cpd(b, 3, :D, [:E]))
push!(b, rand_cpd(b, 3, :C, [:E]))

a = Assignment(:B=>2, :D=>2, :C=>2)

T = table(b,:B,a)*table(b,:S)*table(b,:E,a)*table(b,:D,a)*table(b,:C,a)
sumout(T, :E)

This produces:

LoadError: MethodError: no method matching join(::DataFrames.SubDataFrame{Array{Int64,1}}, ::DataFrames.SubDataFrame{Array{Int64,1}}, ::DataFrames.SubDataFrame{Array{Int64,1}}; on=Symbol[:B,:S,:D,:C])
Closest candidates are:
  join(::Any...) at strings/io.jl:128 got unsupported keyword argument "on"
  join(!Matched::IO, ::Any, ::Any) at strings/io.jl:115 got unsupported keyword argument "on"
  join(!Matched::IO, ::Any, ::Any, !Matched::Any) at strings/io.jl:99 got unsupported keyword argument "on"
  ...
while loading In[12], in expression starting on line 11

 in sumout(::DataFrames.DataFrame, ::Symbol) at C:\Users\Mykel\.julia\v0.5\BayesNets\src\DiscreteBayesNet\factors.jl:43

If you do sumout(T, :B), then it works just fine since B takes on only two different values. E takes on three values. There seems to be some kind of issue with j = join(g..., on=remainingvars) on line 43 here.

Can't install LightGraphs dependency using Pkg.add

Right now, I'm getting the following:

julia> Pkg.add("BayesNets")
INFO: Installing BayesNets v0.4.1
INFO: Installing Graphs v0.6.0
INFO: Installing LaTeXStrings v0.1.6
INFO: Installing TikzGraphs v0.1.1
INFO: Installing TikzPictures v0.3.2
INFO: Building Homebrew
HEAD is now at 53c5089 CoreTap#install: fix clone target setting
HEAD is now at d45fe22 Merge pull request #91 from staticfloat/staging
INFO: Building Blosc
INFO: Building HDF5
INFO: Building LightXML
INFO: Package database updated

Apparently, BayesNets 0.4.1 still uses Graphs.jl. Can someone make a new release available? Much Appreciated.