Skip to content

NaNs in EM after a few iterations (Cholesky decomposition) #404

@umeshksingla

Description

@umeshksingla

I am not fully sure if this issue belongs here, but I did want to make a note of it in case other people run into a similar problem. I am fitting LinearRegressionHMM to a 3d-velocity timeseries data (so emission_dim = 3). This model essentially learns a set of weights, biases, and a covariance matrix for each state.

When I try to fit a high number of states, I run into the issue where all my parameters and log likelihoods returned are NaNs after a few em_steps. It looks like, at some iteration, the covariance matrix returned from m_step is not positive-definite. Such a covariance matrix causes tfd.MultivariateNormalFullCovariance to return nan samples and nan log_prob values for emissions in the following e_step.

Inside tfd.MultivariateNormalFullCovariance, I am having difficulty locating what library it is using for cholesky decomposition. It's most likely using tf.linalg.cholesky() which doesn't raise an error on non positive-definite inputs but returns a lower-tri matrix with nan values. This is unlike numpy or torch as reproduced in the screenshot below on google colab. Maybe switching to tfd.MultivariateNormalTril makes sense to be used in the dynamax LRHMM class, which requires one to pass the cholesky factor explicitly. This way such errors can then be easily spotted by the dynamax users.

Also, any insights are appreciated on what I could do to have the LRHMM's m_step not return a covariance matrix which is not positive-definite? It is already enforced to be symmetric at least.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions