Giter Site home page Giter Site logo

renhongjia / bayesianpy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from morganics/bayesianpy

0.0 1.0 0.0 17.91 MB

A Python SDK for building, training and querying Bayesian networks (a.k.a. belief networks), from a machine learning perspective. This wraps the BayesServer Java API.

License: Apache License 2.0

Python 98.44% Shell 0.32% Batchfile 0.96% Dockerfile 0.28%

bayesianpy's Introduction

BayesianPy

A Python SDK for performing common operations on the Bayes Server Java API, and trying to utilise the best of Python (e.g. dataframes and visualisation). This wraps calls to the Java API with Jpype1. This wrapper is not released/ supported/ endorsed by Bayes Server.

Supported functionality (currently only supports contemporal networks, although Bayes Server supports temporal networks as well):

  • Creating network structures (in network.py and template.py, discrete/ continuous/ discretised/ multivariate nodes)
  • Training a model (in model.py)
  • Querying a model with common query types such as LogLikelihood/ conflict queries, joint probabilities for both continuous and discrete variables (in model.py, allows for multiprocessing as well to speed up query times)
  • AutoInsight (using difference queries to understand variables' significance to the model, in insight.py)
  • Various utility functions for reading dataframes, casting and generally mapping between dataframes -> SQLlite -> Bayes Server.

Note: The SDK currently writes data to an SQLlite database which is then read by the Java API.

Motivation

Python is a simpler language to put something together quickly and the Bayes Server API is very powerful; and consequently it can be time consuming to work with directly. I haven't tried to wrap every single piece of Java code, however I have tried to - in general - separate out any Java calls from the client of the SDK, to allow type hinting and remove any confusion of working through Jpype. You can do a lot more with the Java API directly, however the most common usage; creating network structures, training and querying networks should be mostly accounted for. The Java API is stable (e.g. it doesn't change very much from release to release) however this Python wrapper is very much in flux!

Are Bayesian networks Bayesian? (from BayesServer.com)

Yes and no. They do make use of Bayes Theorem during inference, and typically use priors during batch parameter learning. However they do not typically use a full Bayesian treatment in the Bayesian statistical sense (i.e. hyper parameters and learning case by case). The matter is further confused, as Bayesian networks tyically DO use a full Bayesian approach for Online learning.

Jupyter examples

Example: training a model from a template

logger = logging.getLogger()

# utility function to decide on whether variables are discrete/ continuous
# df is a pandas dataframe.
auto = bayesianpy.data.AutoType(df)

# creates a template to create a single discrete cluster (latent) node with edges to independent 
# child nodes
tpl = bayesianpy.template.MixtureNaiveBayes(logger,
                                                 discrete=df[list(auto.get_discrete_variables())],
                                                 continuous=df[list(auto.get_continuous_variables())],
                                                 latent_states=8)

network_factory = bayesianpy.network.NetworkFactory(logger)
with bayesianpy.data.DataSet(df, db_folder, logger) as dataset:
    model = bayesianpy.model.NetworkModel(tpl.create(network_factory), logger)
    model.train(dataset) # or you can use a subset of the data, e.g. dataset.subset(list_of_indices)
    model.save("model.bayes")

Example: querying a model

# specify the filename of the trained model
network_factory = bayesianpy.network.NetworkFactory(logger, network_file_path='model.bayes')
with bayesianpy.data.DataSet(df, db_folder, logger) as dataset:
    model = bayesianpy.model.NetworkModel(network_factory.create(), dataset, logger)    
    # Get the loglikelihood of the model given the evidence specified in df (here, using the same data as was trained upon)
    # Can also specify to calculate conflict, if required.
    # 'results' is a pandas dataframe, where each variable in df will have an additional column with a suffix of _loglikelihood.
    results = model.batch_query(dataset, [bayesianpy.model.QueryModelStatistics()])
        

More examples

A classification and regression example are included in the examples folder on the Titanic dataset. I'll try and put some more up shortly.

bayesianpy's People

Contributors

morganics avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.