Giter Site home page Giter Site logo

Comments (8)

helske avatar helske commented on June 10, 2024 2

Okay, I now started to work on this. Everything is pretty straightforward in EM for non-mixture case, but I have to think a bit about how to change the numerical optimization of the coefficient in MHMM case of EM algorithm. Haven't looked to non-EM setting yet, but I think that should be pretty easy, and based on the results of the new EM algorithm, there will most likely be considerable performance boosts especially in parallel computation version. Here is a small benchmark for simple HMM model (mvad data in the examples of fit_model):

#seqHMM on CRAN:
microbenchmark(fit <- fit_model(init_hmm_mvad), times = 5)
# Unit: seconds
#                            expr      min       lq     mean   median       uq      max neval
# fit <- fit_model(init_hmm_mvad) 4.948853 4.952317 4.967398 4.957689 4.972136 5.005995     5
microbenchmark(fit <- fit_model(init_hmm_mvad, threads = 4), times = 5)
# Unit: seconds
#                                         expr      min       lq     mean   median       uq      max
# fit <- fit_model(init_hmm_mvad, threads = 4) 3.613984 3.615456 3.628413 3.616545 3.646388 3.649692

#seqHMM from with rewritten EM:
microbenchmark(fit <- fit_model(init_hmm_mvad), times = 5)
# Unit: seconds
#                            expr      min       lq     mean   median       uq      max neval
# fit <- fit_model(init_hmm_mvad) 2.130184 2.131413 2.139089 2.135004 2.148226 2.150617     5
microbenchmark(fit <- fit_model(init_hmm_mvad, threads = 4), times = 5)
# Unit: milliseconds
#                                         expr      min       lq     mean   median       uq      max neval
# fit <- fit_model(init_hmm_mvad, threads = 4) 619.8046 625.4092 650.0121 630.0043 647.1949 727.6473     5

from seqhmm.

helske avatar helske commented on June 10, 2024 1

Yeah, this is probably due to the forward and backward algorithms, where we are trying to build a three dimensional array of size n_states * n_time_points * n_sequences, which is way too much to fit into memory in this case.

For EM algorithm and log-likelihood computation it should be possible to circumvent this though. Currently the codes are written so that all forward and backward probabilities are computed at once, and these are then used in later computations. But we can also compute the probabilities for each sequence separately, do the necessary stuff and move on to next sequence, writing over the previous probabilities. This should be more memory efficient and probably makes the parallelization faster also.

Should be pretty straightforward to make these changes to C++ code, I'll look into it next week.

from seqhmm.

helske avatar helske commented on June 10, 2024 1

Just pushed some new commits relating to this. The functions using log-scale are still unmodified, but the EM and numerical optimization algorithms using natural scale should now scale better with large number of sequences. For the mixture case the EM algorithm still depends on the number of sequences in part where the coefficients corresponding to explanatory are estimated, but the temporary array used is only number_of_states * number_of_sequences, whereas it was number_of_states * number_of_sequences * number_of_time_points before.

from seqhmm.

vrodriguezf avatar vrodriguezf commented on June 10, 2024

Seems to be an internal C++ error :( doesn't look good.

Would it be possible to fit the model iteratively? Start with a base model with 5000 states and 1000 sequences, and use the resulting model as base model for the next 1000 sequences and so on (changing the attribute "observations" of the resulting model).

I'm not sure if that makes too much sense in theory, but it could be a way to avoid the memory troubles.

from seqhmm.

mikael10j avatar mikael10j commented on June 10, 2024

Both of you thank you for your answers, and helske thank you very much for taking a look at it.

from seqhmm.

vrodriguezf avatar vrodriguezf commented on June 10, 2024

Nice work! Thanks a lot!

from seqhmm.

mikael10j avatar mikael10j commented on June 10, 2024

Thanks a lot for your work and reactivity !

from seqhmm.

helske avatar helske commented on June 10, 2024

I assume this helped, so closing now, feel free to reopen if you encounter issues.

from seqhmm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.