NaNs in EM after a few iterations (Cholesky decomposition)

I am not fully sure if this issue belongs here, but I did want to make a note of it in case other people run into a similar problem. I am fitting **[LinearRegressionHMM](https://probml.github.io/dynamax/api.html#dynamax.hidden_markov_model.LinearRegressionHMM)** to a 3d-velocity timeseries data (so emission_dim = 3). This model essentially learns a set of weights, biases, and a covariance matrix for each state.

When I try to fit a high number of states, I run into the issue where all my parameters and log likelihoods returned are NaNs after a few `em_step`s. It looks like, at some iteration, the covariance matrix returned from `m_step` is not positive-definite. Such a covariance matrix causes `tfd.MultivariateNormalFullCovariance` to return nan samples and nan `log_prob` values for emissions in the following `e_step`.

Inside [tfd.MultivariateNormalFullCovariance](https://www.tensorflow.org/probability/api_docs/python/tfp/distributions/MultivariateNormalFullCovariance), I am having difficulty locating what library it is using for cholesky decomposition. It's most likely using `tf.linalg.cholesky()` which doesn't raise an error on non positive-definite inputs but returns a lower-tri matrix with nan values. This is unlike numpy or torch as reproduced in the screenshot below on google colab. Maybe switching to [tfd.MultivariateNormalTril](https://www.tensorflow.org/probability/api_docs/python/tfp/substrates/jax/distributions/MultivariateNormalTriL) makes sense to be used in the dynamax LRHMM class, which requires one to pass the cholesky factor explicitly. This way such errors can then be easily spotted by the dynamax users.

Also, any insights are appreciated on what I could do to have the LRHMM's `m_step` not return a covariance matrix which is not positive-definite? It is already enforced to be symmetric at least.

<img width="1196" alt="Image" src="https://github.com/user-attachments/assets/b84b2a13-671b-4115-b583-b8bd6f91a4d3" />



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NaNs in EM after a few iterations (Cholesky decomposition) #404

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NaNs in EM after a few iterations (Cholesky decomposition) #404

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions