poypoyan / edhsmm Goto Github PK
View Code? Open in Web Editor NEWAn(other) implementation of Explicit Duration Hidden Semi-Markov Models in Python 3
License: MIT License
An(other) implementation of Explicit Duration Hidden Semi-Markov Models in Python 3
License: MIT License
I use your library with joblib. It shows that it is not parallel safety.
Hello poypoyan,
It looks like the sojourn time is modeled as non parametric distribution, will you consider in the future to provide parametric modelisation of the sojourn time such as the Poisson distribution ? (In order to address problems with large sojourn time in hidden states).
Thank you again.
Best
Ronan
Currently, the duration probabilities per state are stored in a 2D array of shape (n_states, n_durations). There are situations wherein aside from these "non-parametric" duration PMFs per state, the duration needs to be estimated by parametric distributions.
The 'hsmm' R package offers 4 parametric distributions for duration:
I need help on the math behind determining the parameters from non-parametric duration PMF, especially these 4 distributions. Suggest some resources. I prefer clear algorithms, but anything relevant is welcome.
Thanks!
So I have been trying to fit a Gaussian HSMM to an observation sequence with monotonic behaviour.
This observation sequence is created with the sample method from a pre initialized HSMM with parameters:
def init4shortMC(hsmm_class):
#initial probability
hsmm_class.pi = np.zeros(5)
hsmm_class.pi[0]=1
# durations
hsmm_class.dur = np.zeros((5,30))
x = np.linspace(0, 30, 30)
y = np.zeros_like(x)
for i in range(len(hsmm_class.dur)):
for j in range(len(hsmm_class.dur[i])):
hsmm_class.dur[i, j] = scipy.stats.norm(30, 10).pdf(x[j,])
for i in range(len(hsmm_class.dur)):
hsmm_class.dur[i, 14] += 1 - hsmm_class.dur[i].sum()
#transition matrix
num_of_states = 5
hsmm_class.tmat = np.zeros((num_of_states,num_of_states))
for i in range(len(hsmm_class.tmat)):
for j in range(len(hsmm_class.tmat[i]) - 1):
if i == j and j < len(hsmm_class.tmat[i]) - 2:
hsmm_class.tmat[i, j + 1] = 0.6
hsmm_class.tmat[i, j + 2] = 0.4
elif i == j and j == len(hsmm_class.tmat[i]) - 2:
hsmm_class.tmat[i, j + 1] = 1
hsmm_class.tmat[-1, -1] = 1
hsmm_class.mean = np.array([10, 20, 30, 40, 50]) # shape should be (n_states, n_dim)
hsmm_class.mean = np.reshape(hsmm_class.mean,(-1,1))
hsmm_class.covmat = np.array([ # shape should be (n_states, n_dim, n_dim) -> array of square matrices
[[10.]],
[[10.]],
[[10.]],
[[10.]],
[[10.]],
])
R1 = GaussianHSMM(n_states=5, n_durations=30, n_iter=100)
init4shortMC(R1)
sample1=R1.sample(100)
Then I am trying to fit a newly initialized model to the observation sequence. In order for the model to capture the monotonic sequence, I initialize it with the following start prob and tmat:
def init4fit(hsmm_class):
# initial probability
hsmm_class.pi = np.zeros(5)
hsmm_class.pi[0] = 1
# transition matrix
num_of_states = 5
hsmm_class.tmat = np.zeros((num_of_states, num_of_states))
for i in range(len(hsmm_class.tmat)):
for j in range(len(hsmm_class.tmat[i]) - 1):
if i == j and j < len(hsmm_class.tmat[i]) - 2:
hsmm_class.tmat[i, j + 1] = 0.6
hsmm_class.tmat[i, j + 2] = 0.4
elif i == j and j == len(hsmm_class.tmat[i]) - 2:
hsmm_class.tmat[i, j + 1] = 1
hsmm_class.tmat[-1, -1] = 1
Then I call the .fit method, with the tmat and start prob frozen so that they are not updated during the EM algorithm. However, the model never converges and outputs nan as the mean, durations and covar matrixes.
Any ideas?
Hello.
I am unable to maintain this and this is not a project I want to maintain anymore, so I decided to archive this repository. I already deleted this library on PyPI, so cannot do pip install edhsmm
anymore.
That being said, here are the final issues I'll work on:
And I'll "release" the final version 0.2.3.
Apologies and thanks for anyone interested. I hope this project helped you in some ways.
poypoyan
The aim of this issue is to recomment sections of code to include specific sections of Yu (2010). This makes the code more readable and helps solve reported bugs (see previous issues).
Hello poypoyan,
I am using the MultinomialHSMM to estimate parameters if a HSMM I encounter issues with the number of symbols (ie different possible observations) that is not well casted into a an integer.
I succeeded with using one sequence only but with multiple sequences it is inot working.
My code is below.
Thanks for your help.
Ronan
import sys
import numpy as np
from edhsmm.hsmm_multinom import MultinomialHSMM
# modified initial parameters for MultinomialHSMM
def init_true_model(hsmm_class):
hsmm_class.pi = np.array([2 / 3, 1 / 3, 0 / 3])
hsmm_class.dur = np.array([
[0.1, 0.005, 0.005, 0.89],
[0.1, 0.005, 0.89, 0.005],
[0.1, 0.89, 0.005, 0.005]
])
hsmm_class.tmat = np.array([
[0.0, 0.5, 0.5],
[0.3, 0.0, 0.7],
[0.4, 0.6, 0.0]
])
hsmm_class.emit = np.array([
[0.8, 0.1, 0.1],
[0.1, 0.8, 0.1],
[0.1, 0.1, 0.8]
]) # shape should be (n_states, n_symbols)
# modified initial parameters for MultinomialHSMM
def init_middle_model(hsmm_class):
hsmm_class.pi = np.array([1/3, 1/3, 1/3])
hsmm_class.dur = np.array([
[0.25, 0.25, 0.25, 0.25],
[0.25, 0.25, 0.25, 0.25],
[0.25, 0.25, 0.25, 0.25]
])
hsmm_class.tmat = np.array([
[0.0, 0.5, 0.5],
[0.5, 0.0, 0.5],
[0.5, 0.5, 0.0]
])
hsmm_class.emit = np.array([
[1/3, 1/3, 1/3],
[1/3, 1/3, 1/3],
[1/3, 1/3, 1/3]
]) # shape should be (n_states, n_symbols)
# initialize HSMM
squirrel_true_model = MultinomialHSMM(n_states = 3, n_durations = 4)
init_true_model(squirrel_true_model)
rng_seed = 12345
n_samples = 100
n_squirrel = 10
all_obs = np.empty((n_samples,n_squirrel))
for s in range(n_squirrel):
states = squirrel_true_model.sample(n_samples=n_samples, random_state=rng_seed)
obs = states[2]
all_obs[:, s] = obs
rng_seed = rng_seed + 1
squirrel_middle_model = MultinomialHSMM(n_states = 3, n_durations = 4)
init_middle_model(squirrel_true_model)
squirrel_middle_model.fit(all_obs)
# display learned parameters for T
print("Start Probabilities: [T]\n", squirrel_middle_model.pi, "\n")
print("Transition Matrix: [T]\n", squirrel_middle_model.tmat, "\n")
print("Durations: [T]\n", squirrel_middle_model.dur, "\n")
print("Emission Probabilities: [T]\n", squirrel_middle_model.emit, "\n")
Could you please give a hint, how I can fit model parameters using high dimensional data. For example if I have Data(n*p) n-> number of observation and p -> dimension of observation.
Hello,
First I would like to thank you for the code you provided on hsmm that works very fine.
I am very interested to apply left censoring on my dataset and as far as I know, your software is the only one to have this option.
You mentioned in your notebook that it is an experimental feature, could you, please, give me more information ? (references, ...)
Thanks in advance
Ronan
The function check()
in HSMM class checks model parameters for errors before performing the main algorithms.
For the HSMM (base) class, here are some of the properties to be checked:
HSMM.pi
, HSMM.tmat
, and HSMM.dur
should be greater than 0, because EM algorithm cannot update 0 probability.HSMM.pi
(start probabilities), numpy.shape = (n_states), and the sum should be 1.HSMM.tmat
(transition matrix), numpy.shape = (n_states, n_states), every diagonal is zero, and every row should add up to 1.HSMM.dur
(duration probabilities), numpy.shape = (n_states, n_durations), and every row should add up to 1.For Gaussian sub-class, here are the properties:
GaussianHSMM.means
, numpy.shape = (n_states).GaussianHSMM.sdev
, numpy.shape = (n_states), and all entries should be greater than 0.Just comment if some properties are not listed here.
I wrote before in the README of the deleted test
branch:
Unsure to be implemented
- Left-censoring (I have difficulty understanding and implementing it)
I think this is a good feature to have, but I also think lots of refactoring of the hsmm_core
is needed.
More changes:
end[-1] != n_samples
in iter_from_X_length()
. This is more general than before, which is end[-1] > n_samples
.python >= 3.6
and hence, drop Python 3.5.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.