Comments (7)
I haven't checked whether the structure of your network is fine but the gist with PyTorch is that it has a relatively high startup cost for each operation due to the overhead of having to manage tensors that might potentially be on a GPU, but that the variable cost of dealing with bigger operations is smaller. Basically, the cost of doing an addition at all is higher, but as you need to add more numbers together the total time doesn't grow as fast as other libraries. So I'm not too surprised that operations sub 1 second might be a bit slower than CPU-centric implementations. I would expect that pomegranate would scale better than the other libraries though.
I'm not sure if adding a GPU would help too much but it's def worth trying because I'd also be surprised if it didn't help at all. If you have a lot of small operations the I/O of moving things to the GPU may dominate over the gains of doing the operations there. As the table size grows GPUs will help a ton as you can see in the examples in the README.
You're right that there isn't a fixed answer to "which is faster," because it depends on the number of nodes, number of edges, and basically just table size. The rough way of thinking about it is whether it's slow because there's tons of small operations or whether a few BIG operations are dominating the speed. pomegranate does big operations way faster than most other libraries but -- for the reasons mentioned above -- does small operations a bit slower.
from pomegranate.
I understand, thank you for the detailed response!
Currently our most time-consuming calculation is this:
- do initial inference for all nodes/variables with 0 evidence (I use node and variable interchangeably)
- iterate over ~40-50-60% of our nodes (let's call them "observables"; size depends on the given network)
- for every observable node set one of the states as evidence (do it for all possible states, which number can usually be around 2-3-4)
- calculate inference for the other 40-50-60% of nodes*
- calculate a value for the observable node which shows how much effect/impact changing it's state has on the rest of the variables (~comparing original inference with 0 evidence to the just now calculated probabilities)
(Maybe this use-case could be related to "do-calculus"?)
*currently for pomegranate I always calculate for all the nodes, not sure yet, how to cut the "targets"
Unfortunately based on your description pytorch and hence pomegranate is not "optimized" for the current implementation. This solution using any library is wasteful as most of the things we recalculate should be the same as before. I'm wondering if you could suggest some way to leverage pomegranate's strength to do this calculation faster?
Will definitely try GPU as well, it is just not possible to do currently.
from pomegranate.
I have 2 more questions - hoping it could fit here.
As mentioned earlier, I don't know how to calculate inference for just a given set of variables. Is there a way to do it?
Many times I set 1 evidence and then want to calculate inference just for like x% of the variables, but
predict_proba_result = model.predict_proba(X_masked)
calculates for all the nodes. Would it be possible to not calculate for all but also not set those as some evidence? (Let's say I have A,B,C,D,E. I set A=1 as evidence and only curious about C and D probabilities.)
The other question is related to the comment for predict_proba: ..warning:: This inference is exact given a Bayesian network that has a tree-like structure, but is only approximate for other cases. When the network is acyclic, this procedure will converge, but if the graph contains cycles then there is no guarantee on convergence.
How could this be exact if pomegranate is using sum-product/loopy belief propagation algorithm? I have a tree structure and the numbers I get sometimes are not just a bit off which also points to approximate inference under the hood. Could you elaborate?
from pomegranate.
In theory, you can avoid calculating the posterior for some variables if they are not in the Markov blanket of the ones you care about. Basically, if variable A could influence the value of variable B, you'd have to calculate both of them even if you only carred about B. But if A could not influence B then, you're right, you wouldn't need to calculate it. Unfortunately, this functionality is not implemented in pomegranate.
The sum-product algorithm is supposed to produce exact estimates for tree-structured networks. Are you running it until convergence (i.e., checking by setting max_iter
to be higher? https://github.com/jmschrei/pomegranate/blob/master/pomegranate/bayesian_network.py#L68)
from pomegranate.
I understand, thank you!
I've tried testing max_iter
earlier and now as well, but it slows down the inference too much and even doubling it didn't give good enough results in terms of actual probabilities.
from pomegranate.
Sorry that it's too slow for your applications. For networks that have small tables, where not much batching of operations can be done, I agree that other CPU-based implementations may be better for you. If you can point me to implementations or papers about how to speed up your specific problem I'd be happy to read over them and think about whether I can incorporate it, but I can't guarantee I'll have time soon to implement it. Best of luck with your work!
from pomegranate.
I understand, no problem! I don't know of these algorithms yet but will have a look around. Thank you!
from pomegranate.
Related Issues (20)
- [BUG]Speed issue HOT 3
- [Question] Fitting multivariate Markov Chain throws Index out of bounds error HOT 3
- [BUG] HMM edges matrix initialization contains NaNs
- How to save and load a GMM model? HOT 1
- Question: Can i build a HMM, where the distribution depends on a value of a specific position HOT 3
- [Question] How to properly preprocess data for Markov Chain? HOT 5
- Markov Chain - Index out of bounds error HOT 2
- [Suggestion] Better description of how to implement HMM's HOT 1
- [Question] HMM Function to calculate joint probability P(O | λ) based on learned HMM λ HOT 1
- [Question] How to retrieve dense transition matrix for a fitted hmm in version 1.0.4 HOT 3
- [Question] How to calculate marginals for Bayesian net in new pomegranate? (+input format for ConditionalCategorical) HOT 3
- [Question] What is the difference between predict_proba and log_probability methods for HMMs HOT 6
- [Request] Complexity Information HOT 4
- conditional_categorical.py : RuntimeError: scatter(): Expected self.dtype to be equal to src.dtype[BUG] HOT 1
- [QUESTION] I would like to fix the transition matrix upon running. Can I have some help pointing to what update steps need to be commented? HOT 14
- import pomegranate error HOT 1
- HMM with covariates? HOT 6
- [BUG] ragged input to HMM HOT 1
- Gaussian Mixture yields large negative improvement on first step using weighted data with large range in weights HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pomegranate.