-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Dear Team,
MLE is one of the imputation options, which calls the em.norm and imp.norm functions from the norm package. And implemented by Margin ==2 .
I think Margin ==2 is a reasonable setting since the p*n original data matrix (features in rows and samples in columns) would be transposed before sending to the EM algorithm. Therefore when doing EM each feature would be the actual genes/proteins/peptides.
But the issue is proteomics data is always p>>n. We would have ~20000 proteins and a dozen of samples in TMT global proteome data set for example. Then with as good number of features, EM algorithm is so expensive.
I am trying this data set (10k * 24) with the impute_mle function and haven't got any results yet.
dtmt = fread("ccRCC_prot_abundance_MD_3plex.tsv",
stringsAsFactors = F, data.table = F)
dd = as.matrix(dtmt[,-c(1:5)])
dtmt_res = MsCoreUtils::impute_mle(dd)
Do you have any insights on this issue?
Thank you very much!