Comments (8)
Okay, I now started to work on this. Everything is pretty straightforward in EM for non-mixture case, but I have to think a bit about how to change the numerical optimization of the coefficient in MHMM case of EM algorithm. Haven't looked to non-EM setting yet, but I think that should be pretty easy, and based on the results of the new EM algorithm, there will most likely be considerable performance boosts especially in parallel computation version. Here is a small benchmark for simple HMM model (mvad data in the examples of fit_model
):
#seqHMM on CRAN:
microbenchmark(fit <- fit_model(init_hmm_mvad), times = 5)
# Unit: seconds
# expr min lq mean median uq max neval
# fit <- fit_model(init_hmm_mvad) 4.948853 4.952317 4.967398 4.957689 4.972136 5.005995 5
microbenchmark(fit <- fit_model(init_hmm_mvad, threads = 4), times = 5)
# Unit: seconds
# expr min lq mean median uq max
# fit <- fit_model(init_hmm_mvad, threads = 4) 3.613984 3.615456 3.628413 3.616545 3.646388 3.649692
#seqHMM from with rewritten EM:
microbenchmark(fit <- fit_model(init_hmm_mvad), times = 5)
# Unit: seconds
# expr min lq mean median uq max neval
# fit <- fit_model(init_hmm_mvad) 2.130184 2.131413 2.139089 2.135004 2.148226 2.150617 5
microbenchmark(fit <- fit_model(init_hmm_mvad, threads = 4), times = 5)
# Unit: milliseconds
# expr min lq mean median uq max neval
# fit <- fit_model(init_hmm_mvad, threads = 4) 619.8046 625.4092 650.0121 630.0043 647.1949 727.6473 5
from seqhmm.
Yeah, this is probably due to the forward and backward algorithms, where we are trying to build a three dimensional array of size n_states * n_time_points * n_sequences, which is way too much to fit into memory in this case.
For EM algorithm and log-likelihood computation it should be possible to circumvent this though. Currently the codes are written so that all forward and backward probabilities are computed at once, and these are then used in later computations. But we can also compute the probabilities for each sequence separately, do the necessary stuff and move on to next sequence, writing over the previous probabilities. This should be more memory efficient and probably makes the parallelization faster also.
Should be pretty straightforward to make these changes to C++ code, I'll look into it next week.
from seqhmm.
Just pushed some new commits relating to this. The functions using log-scale are still unmodified, but the EM and numerical optimization algorithms using natural scale should now scale better with large number of sequences. For the mixture case the EM algorithm still depends on the number of sequences in part where the coefficients corresponding to explanatory are estimated, but the temporary array used is only number_of_states * number_of_sequences, whereas it was number_of_states * number_of_sequences * number_of_time_points before.
from seqhmm.
Seems to be an internal C++ error :( doesn't look good.
Would it be possible to fit the model iteratively? Start with a base model with 5000 states and 1000 sequences, and use the resulting model as base model for the next 1000 sequences and so on (changing the attribute "observations" of the resulting model).
I'm not sure if that makes too much sense in theory, but it could be a way to avoid the memory troubles.
from seqhmm.
Both of you thank you for your answers, and helske thank you very much for taking a look at it.
from seqhmm.
Nice work! Thanks a lot!
from seqhmm.
Thanks a lot for your work and reactivity !
from seqhmm.
I assume this helped, so closing now, feel free to reopen if you encounter issues.
from seqhmm.
Related Issues (20)
- [Help request] How to test the model? HOT 1
- can this package support Multivariate Discrete HMM?? HOT 3
- will it work for multivariate time series prediction : different continues or/and discrete/category observation HOT 4
- HMM cluster assignments HOT 2
- seqdata should be a state sequence object HOT 1
- Apply to financial time series? HOT 1
- Parsing HMMER3 files HOT 2
- 'System seems singular' and 'EM algorithm failed' for Mixture Markov Model HOT 2
- Building a model with different sequence lengths HOT 1
- hidden_paths does not respect sequence length HOT 4
- Parallel computation HOT 2
- Forward probability of MHMM vs HMM HOT 1
- standard errors for HMM parameters HOT 3
- Maximum number of colours in cpal/colorpalette HOT 4
- Extracting combinations of emitted states HOT 4
- Error in if (em.con$reltol < resEM$change) { : argument is of length zero HOT 5
- Absorbing state broken in `build_mm()` (seqhmm 1.2.1-1) HOT 1
- Runtime Estimation HOT 1
- EM algorithm failed HOT 1
- Error: number of labels must equal number of states in the alphabet HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seqhmm.