Hello, I have to build a Markov model with around 5000 states and 32000 sequences,

fit_model error for markov model with many states about seqhmm HOT 8 CLOSED

helske commented on June 10, 2024

fit_model error for markov model with many states

from seqhmm.

Comments (8)

helske commented on June 10, 2024 2

Okay, I now started to work on this. Everything is pretty straightforward in EM for non-mixture case, but I have to think a bit about how to change the numerical optimization of the coefficient in MHMM case of EM algorithm. Haven't looked to non-EM setting yet, but I think that should be pretty easy, and based on the results of the new EM algorithm, there will most likely be considerable performance boosts especially in parallel computation version. Here is a small benchmark for simple HMM model (mvad data in the examples of fit_model):

#seqHMM on CRAN:
microbenchmark(fit <- fit_model(init_hmm_mvad), times = 5)
# Unit: seconds
#                            expr      min       lq     mean   median       uq      max neval
# fit <- fit_model(init_hmm_mvad) 4.948853 4.952317 4.967398 4.957689 4.972136 5.005995     5
microbenchmark(fit <- fit_model(init_hmm_mvad, threads = 4), times = 5)
# Unit: seconds
#                                         expr      min       lq     mean   median       uq      max
# fit <- fit_model(init_hmm_mvad, threads = 4) 3.613984 3.615456 3.628413 3.616545 3.646388 3.649692

#seqHMM from with rewritten EM:
microbenchmark(fit <- fit_model(init_hmm_mvad), times = 5)
# Unit: seconds
#                            expr      min       lq     mean   median       uq      max neval
# fit <- fit_model(init_hmm_mvad) 2.130184 2.131413 2.139089 2.135004 2.148226 2.150617     5
microbenchmark(fit <- fit_model(init_hmm_mvad, threads = 4), times = 5)
# Unit: milliseconds
#                                         expr      min       lq     mean   median       uq      max neval
# fit <- fit_model(init_hmm_mvad, threads = 4) 619.8046 625.4092 650.0121 630.0043 647.1949 727.6473     5

from seqhmm.

helske commented on June 10, 2024 1

Yeah, this is probably due to the forward and backward algorithms, where we are trying to build a three dimensional array of size n_states * n_time_points * n_sequences, which is way too much to fit into memory in this case.

For EM algorithm and log-likelihood computation it should be possible to circumvent this though. Currently the codes are written so that all forward and backward probabilities are computed at once, and these are then used in later computations. But we can also compute the probabilities for each sequence separately, do the necessary stuff and move on to next sequence, writing over the previous probabilities. This should be more memory efficient and probably makes the parallelization faster also.

Should be pretty straightforward to make these changes to C++ code, I'll look into it next week.

from seqhmm.

helske commented on June 10, 2024 1

Just pushed some new commits relating to this. The functions using log-scale are still unmodified, but the EM and numerical optimization algorithms using natural scale should now scale better with large number of sequences. For the mixture case the EM algorithm still depends on the number of sequences in part where the coefficients corresponding to explanatory are estimated, but the temporary array used is only number_of_states * number_of_sequences, whereas it was number_of_states * number_of_sequences * number_of_time_points before.

from seqhmm.

vrodriguezf commented on June 10, 2024

Seems to be an internal C++ error :( doesn't look good.

Would it be possible to fit the model iteratively? Start with a base model with 5000 states and 1000 sequences, and use the resulting model as base model for the next 1000 sequences and so on (changing the attribute "observations" of the resulting model).

I'm not sure if that makes too much sense in theory, but it could be a way to avoid the memory troubles.

from seqhmm.

mikael10j commented on June 10, 2024

Both of you thank you for your answers, and helske thank you very much for taking a look at it.

from seqhmm.

vrodriguezf commented on June 10, 2024

Nice work! Thanks a lot!

from seqhmm.

mikael10j commented on June 10, 2024

Thanks a lot for your work and reactivity !

from seqhmm.

helske commented on June 10, 2024

I assume this helped, so closing now, feel free to reopen if you encounter issues.

from seqhmm.

fit_model error for markov model with many states about seqhmm HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent