poypoyan / edhsmm Goto Github PK

8.0 2.0 3.0 223 KB

An(other) implementation of Explicit Duration Hidden Semi-Markov Models in Python 3

License: MIT License

Python 23.79% Jupyter Notebook 65.51% Cython 10.70%

machine-learning probabilistic-graphical-models em-algorithm python cython hsmm notebooks hidden-markov-model hidden-markov-models

edhsmm's People

Contributors

Stargazers

Watchers

Forkers

bakhtiaris yjustc2019 uky-ira

edhsmm's Issues

no safe multi-process/threads

I use your library with joblib. It shows that it is not parallel safety.

Sojourn time parametric modelisation

Hello poypoyan,
It looks like the sojourn time is modeled as non parametric distribution, will you consider in the future to provide parametric modelisation of the sojourn time such as the Poisson distribution ? (In order to address problems with large sojourn time in hidden states).
Thank you again.
Best
Ronan

Duration Distributions

Currently, the duration probabilities per state are stored in a 2D array of shape (n_states, n_durations). There are situations wherein aside from these "non-parametric" duration PMFs per state, the duration needs to be estimated by parametric distributions.

The 'hsmm' R package offers 4 parametric distributions for duration:

"geom" = Geometric
"nbinom" = Negative Binomial
"log" = Logarithmic
"pois" = Poisson

I need help on the math behind determining the parameters from non-parametric duration PMF, especially these 4 distributions. Suggest some resources. I prefer clear algorithms, but anything relevant is welcome.

Thanks!

Trying to create HSMM with left-to-right assumption

So I have been trying to fit a Gaussian HSMM to an observation sequence with monotonic behaviour.

This observation sequence is created with the sample method from a pre initialized HSMM with parameters:

def init4shortMC(hsmm_class):
#initial probability
hsmm_class.pi = np.zeros(5)
hsmm_class.pi[0]=1

# durations
hsmm_class.dur = np.zeros((5,30))
x = np.linspace(0, 30, 30)
y = np.zeros_like(x)

for i in range(len(hsmm_class.dur)):
    for j in range(len(hsmm_class.dur[i])):
        hsmm_class.dur[i, j] = scipy.stats.norm(30, 10).pdf(x[j,])

for i in range(len(hsmm_class.dur)):
    hsmm_class.dur[i, 14] += 1 - hsmm_class.dur[i].sum()

#transition matrix
num_of_states = 5
hsmm_class.tmat = np.zeros((num_of_states,num_of_states))

for i in range(len(hsmm_class.tmat)):
    for j in range(len(hsmm_class.tmat[i]) - 1):
        if i == j and j < len(hsmm_class.tmat[i]) - 2:
            hsmm_class.tmat[i, j + 1] = 0.6
            hsmm_class.tmat[i, j + 2] = 0.4
        elif i == j and j == len(hsmm_class.tmat[i]) - 2:
            hsmm_class.tmat[i, j + 1] = 1

hsmm_class.tmat[-1, -1] = 1

hsmm_class.mean = np.array([10, 20, 30, 40, 50])   # shape should be (n_states, n_dim)
hsmm_class.mean = np.reshape(hsmm_class.mean,(-1,1))
hsmm_class.covmat = np.array([   # shape should be (n_states, n_dim, n_dim) -> array of square matrices
    [[10.]],
    [[10.]],
    [[10.]],
    [[10.]],
    [[10.]],

])

R1 = GaussianHSMM(n_states=5, n_durations=30, n_iter=100)

init4shortMC(R1)
sample1=R1.sample(100)

Then I am trying to fit a newly initialized model to the observation sequence. In order for the model to capture the monotonic sequence, I initialize it with the following start prob and tmat:

def init4fit(hsmm_class):
    # initial probability
    hsmm_class.pi = np.zeros(5)
    hsmm_class.pi[0] = 1

    # transition matrix
    num_of_states = 5
    hsmm_class.tmat = np.zeros((num_of_states, num_of_states))

    for i in range(len(hsmm_class.tmat)):
        for j in range(len(hsmm_class.tmat[i]) - 1):
            if i == j and j < len(hsmm_class.tmat[i]) - 2:
                hsmm_class.tmat[i, j + 1] = 0.6
                hsmm_class.tmat[i, j + 2] = 0.4
            elif i == j and j == len(hsmm_class.tmat[i]) - 2:
                hsmm_class.tmat[i, j + 1] = 1

    hsmm_class.tmat[-1, -1] = 1

Then I call the .fit method, with the tmat and start prob frozen so that they are not updated during the EM algorithm. However, the model never converges and outputs nan as the mean, durations and covar matrixes.

Any ideas?

What is the selection criteria of D in HSMM?

Archiving this Repo

Hello.

I am unable to maintain this and this is not a project I want to maintain anymore, so I decided to archive this repository. I already deleted this library on PyPI, so cannot do pip install edhsmm anymore.

That being said, here are the final issues I'll work on:

And I'll "release" the final version 0.2.3.

Apologies and thanks for anyone interested. I hope this project helped you in some ways.

poypoyan

Recomment code

The aim of this issue is to recomment sections of code to include specific sections of Yu (2010). This makes the code more readable and helps solve reported bugs (see previous issues).

Using multiple sequences of observation

Hello poypoyan,
I am using the MultinomialHSMM to estimate parameters if a HSMM I encounter issues with the number of symbols (ie different possible observations) that is not well casted into a an integer.

I succeeded with using one sequence only but with multiple sequences it is inot working.
My code is below.
Thanks for your help.
Ronan

import sys

import numpy as np
from edhsmm.hsmm_multinom import MultinomialHSMM

# modified initial parameters for MultinomialHSMM
def init_true_model(hsmm_class):
    hsmm_class.pi = np.array([2 / 3, 1 / 3, 0 / 3])
    hsmm_class.dur = np.array([
        [0.1, 0.005, 0.005, 0.89],
        [0.1, 0.005, 0.89, 0.005],
        [0.1, 0.89, 0.005, 0.005]
    ])
    hsmm_class.tmat = np.array([
        [0.0, 0.5, 0.5],
        [0.3, 0.0, 0.7],
        [0.4, 0.6, 0.0]
    ])
    hsmm_class.emit = np.array([
        [0.8, 0.1, 0.1],
        [0.1, 0.8, 0.1],
        [0.1, 0.1, 0.8]
    ])   # shape should be (n_states, n_symbols)

# modified initial parameters for MultinomialHSMM
def init_middle_model(hsmm_class):
    hsmm_class.pi = np.array([1/3, 1/3, 1/3])
    hsmm_class.dur = np.array([
        [0.25, 0.25, 0.25, 0.25],
        [0.25, 0.25, 0.25, 0.25],
        [0.25, 0.25, 0.25, 0.25]
    ])
    hsmm_class.tmat = np.array([
        [0.0, 0.5, 0.5],
        [0.5, 0.0, 0.5],
        [0.5, 0.5, 0.0]
    ])
    hsmm_class.emit = np.array([
        [1/3, 1/3, 1/3],
        [1/3, 1/3, 1/3],
        [1/3, 1/3, 1/3]
    ])   # shape should be (n_states, n_symbols)


# initialize HSMM
squirrel_true_model = MultinomialHSMM(n_states = 3, n_durations = 4)
init_true_model(squirrel_true_model)

rng_seed = 12345
n_samples = 100
n_squirrel = 10
all_obs = np.empty((n_samples,n_squirrel))
for s in range(n_squirrel):
    states = squirrel_true_model.sample(n_samples=n_samples, random_state=rng_seed)
    obs = states[2]
    all_obs[:, s] = obs
    rng_seed = rng_seed + 1

squirrel_middle_model = MultinomialHSMM(n_states = 3, n_durations = 4)
init_middle_model(squirrel_true_model)
squirrel_middle_model.fit(all_obs)

# display learned parameters for T
print("Start Probabilities: [T]\n", squirrel_middle_model.pi, "\n")
print("Transition Matrix: [T]\n", squirrel_middle_model.tmat, "\n")
print("Durations: [T]\n", squirrel_middle_model.dur, "\n")
print("Emission Probabilities: [T]\n", squirrel_middle_model.emit, "\n")

Multi-Dimensional Feature Space

Could you please give a hint, how I can fit model parameters using high dimensional data. For example if I have Data(n*p) n-> number of observation and p -> dimension of observation.

About the Left censoring method

Hello,
First I would like to thank you for the code you provided on hsmm that works very fine.
I am very interested to apply left censoring on my dataset and as far as I know, your software is the only one to have this option.
You mentioned in your notebook that it is an experimental feature, could you, please, give me more information ? (references, ...)
Thanks in advance
Ronan

Implement check() function in HSMM class

The function check() in HSMM class checks model parameters for errors before performing the main algorithms.

For the HSMM (base) class, here are some of the properties to be checked:

n_states should be at least 2, while n_durations should be at least 1.
every entries (probabilities) in HSMM.pi, HSMM.tmat, and HSMM.dur should be greater than 0, because EM algorithm cannot update 0 probability.
for HSMM.pi (start probabilities), numpy.shape = (n_states), and the sum should be 1.
for HSMM.tmat (transition matrix), numpy.shape = (n_states, n_states), every diagonal is zero, and every row should add up to 1.
for HSMM.dur (duration probabilities), numpy.shape = (n_states, n_durations), and every row should add up to 1.

For Gaussian sub-class, here are the properties:

for GaussianHSMM.means, numpy.shape = (n_states).
for GaussianHSMM.sdev, numpy.shape = (n_states), and all entries should be greater than 0.

Just comment if some properties are not listed here.

Left-censoring

I wrote before in the README of the deleted test branch:

Unsure to be implemented

Left-censoring (I have difficulty understanding and implementing it)

I think this is a good feature to have, but I also think lots of refactoring of the hsmm_core is needed.

Pre-allocating instead of resizing ndarray + More

More changes:

Raise error if end[-1] != n_samples in iter_from_X_length(). This is more general than before, which is end[-1] > n_samples.
~~Use f-strings. This implies python >= 3.6 and hence, drop Python 3.5.~~ (Next time)

Bug in score() function in HSMM

After some testing, it is found out that the score() function is giving different result to the fit() score.

poypoyan / edhsmm Goto Github PK

edhsmm's People

Contributors

Stargazers

Watchers

Forkers

edhsmm's Issues

Unsure to be implemented

Recommend Projects

Recommend Topics

Recommend Org