Clustering gene expression time series data embedded in a nonparametric setup

A clustering methodology for time series data is proposed. The idea has been cropped up when a subset of gene expression dataset is used to build up the system model by compressing the information through clustering and then by tracing out inherent patterns in the data. A linear mixed model is considered that accommodates time dependent components. The temporal effects are modelled through an autoregressive process that arises in the dispersion of the random component. The joint distribution of coefficients in the time dependent quadratic function and the random effects are embedded within a non-parametric prior (Dirichlet process prior). Such a non-parametric prior induces clustering in the data. Monte Carlo EM (MCEM) based technique has been considered for estimating the parameters. The best cluster is selected through some heterogeneity measures. A rigorous simulation study has been carried out prior to analysis of a gene expression time series data.

Fulltext