Comments (7)
BQL SNIPPET
;;;;; load plugins for custom models
.load-custom-model linreg.py "LinearRegression"
.load-custom-model logreg.py “LogisticRegression”
;;;;; set up dependencies
UPDATE SCHEMA FOR t MODEL foo AS CUSTOM MODEL "LogisticRegression" WITH INPUTS bar, baz
UPDATE SCHEMA FOR t MODEL quux AS CUSTOM MODEL “LinearRegression” WITH INPUTS foo, bar, baz
;;;;; illustration of an intentional limitation on model composition: can’t have cycles
UPDATE SCHEMA FOR t MODEL bar AS CUSTOM MODEL “LinearRegression” WITH INPUTS foo, quux
===> ERROR: can’t have cyclic dependencies. Ignoring!
EXPLANATION
bar
&baz
modeled by default meta-model/crosscat.foo | bar, baz
modeled by logistic regression (linear regression run through a logit to stochastically generate a binary outcome).quux | foo, bar, baz
modeled by linear regression.
In principle this means that e.g.:
- simulating
foo
requires simulating the logistic regression AND simulating from crosscat to fill in any missing values ofbar
andbaz
. - simulating/assessing predictive probabilities of
bar
givenfoo
andquux
needs to propagate estimated marginal likelihoods back from the two regression models, to include as factors in crosscat — integrating over both the set of models in thefoo/quux
meta-models as well as the possible imputations of all inputs to these models. - “analyzing” the meta-models for
foo
needs to be interleavable with analysis transitions in the upstream default meta-model and in the downstream meta-model forquux
.
Right now all we need are “good enough” approximations to the optimal Bayesian thing. Ultimately we will also want to support the full Bayesian interface; it will actually be tractable sometimes.
A GOOD-ENOUGH APPROXIMATION FOR A LAUNCH
- each time a custom model is “analyzed”, it gets a single (transient) imputation of all input rows.
- each time an input to a custom model (or DAG of custom models) is simulated or its predictive probability is assessed, the marginal likelihood is approximated using a single sample of the input values.
- the only meta-models we include are “point estimates” that don’t maintain internal posteriors but instead look for internal MAP estimates.
IMPLEMENTATION IDEAS
Ultimately it would be great to treat custom models as custom meta-models. I currently think this requires:
- custom generators
- a DAG-of-generators composition operator for generators
- syntax transformers to handle the “flattened” USING SCHEMA syntax
I had naively assumed this was too much work for a first launch, but it would be great if it was the right strategy from the beginning. Until we have an example plugin meta-model, nobody will believe that meta-models really are an open set, and people will think BayesDB is just “crosscat + some plugins”.
For reference, here’s my internal model for generators (and my starting point for the relevant math in the paper):
creating a generator
initialize(X)
analyze(...)
"improves" the quality of the generator (for generators that are meta-models, by reducing the KL between the posterior on models and the distribution from which the internal models are sampled)
using a generator
simulate w/ GIVEN ... = ... [AND WHERE rowid = …]
~P( { target_i } | { given_j }, [opt: rowid or “hypothetical”], X)
predictive probability …
P( { target_i } | { given_j }, [opt: rowid or “hypothetical”], X)
- “context-independent dependence probability (*)”:
Pr[ { var_i } are mutually independent, structurally | X]
(*) if we want all independencies, including those that are not implied by the model structure but happen to be true because of the model parameters, we need to estimate MI (which reduces to simulate and predictive) and check for 0s.
relationship to SPs
There is a mapping between meta-models and higher-order Venture SPs:
(make-my-custom-meta-model)
=> (observe-row simulate-row predictive-row structural-dependence)
where observe-row, etc, are all SPs that share a single latent state that
stores all the parameters of the meta-model.
This could make it easy to prototype & test new meta-models in Venture. If you want to discuss this further I’d be delighted to. It isn’t necessary pre-launch but is likely to be a core feature soon afterwards.
It also will help pin down a real, efficient version of a “foreign SP interface”, which we may decide to pin down this summer. If we have it, we can then unify a great deal of testing & profiling infrastructure, and also have a canonical library of “probability distributions” and “fast inference primitives”, etc.
from bayeslite.
(updated comment with formatting for easier readability)
from bayeslite.
I am going to assume that custom model
, custom generator
, and foreign predictor
all mean the same thing.
Question 1: What is the interface that the CUSTOM MODEL Regression
exposes?
The latest metamodels.pdf
document refers to this object as CUSTOM GENERATOR Regression
rather than CUSTOM MODEL
. which is suggestive that we have two options:
- Expose the simple interface for data generators (
simulate
andlogpdf
). - Expose the extended interface for Markov-chain meta-models (
insert
,analyze
, ...).
Since Regression
is a generator that can be learned from data, it makes sense for it to live in the default metamodel
. I can imagine a Bayesian regression in which approximate probabilistic inference (hence analyze
) makes sense for Regression
, but for an OLS regression analyze
will converge in one step (assuming no missing regressors / ie no imputations) and further inference will not improve inference (unless we insert
new observations).
from bayeslite.
Under development in bdbcontrib/src/foreign
.
from bayeslite.
This wants to come back to trunk post gpm refactor.
from bayeslite.
@tibbetts the branch has been merged in bdbcontrib
, should I close this issue and set a separate one for 'migrate foreign predictor to trunk'?
from bayeslite.
Just close this.
from bayeslite.
Related Issues (20)
- SIMULATE...GIVEN syntax is unlike SELECT...WHERE and may cause unintended errors
- Loom Backend: Fix the caching mechanism of server objects
- Loom Backend: Fix retrieval of cells from base table when given rowid in simulate_joint HOT 1
- Loom Backend: Create mapping from loom rowid to table rowid
- Loom Backend: Fix off-by-one rowid mapping, SQLITE rowids typically start from one. HOT 1
- Document (or automate) ./check.sh needing --pyargs bayeslite when using "$@" mode.
- Consider removing VERSION, or enforce that people updating tags use it properly/create PRs HOT 1
- Fix Loom._convert_to_proper_stattype to account for float/int differences
- Determine how to encode boolean variable in query call in LoomBackend.logpdf_joint
- Remove unneeded casefolds in LoomBackend.simulate_joint HOT 1
- Support overrides with conditional models/component models post-generator creation
- Support enabling or disabling BQL language features via feature flags
- One can ESTIMATE but not SIMULATE mutual information with the loom backend. HOT 2
- Check for existing rowid in base table, not loom, for simulate in loom backend
- TABLE bayesdb_loom_rowid_mapping makes inappropriate uniqueness assumptions HOT 2
- ALTER POPULATION not available in Loom HOT 1
- Make Loom backend interrupt safe with ctrl+c by clearing broken cached query servery
- fix (old?) link to website HOT 2
- Documentation for newcomers to the code
- Installation fails with UnsatisfiableError HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bayeslite.