Abstract
Abstract Motivation: Mixtures of factor analyzers enable model-based clustering to be undertaken for high-dimensional microarray data, where the number of observations n is small relative to the number of genes p. Moreover, when the number of clusters is not small, for example, where there are several different types of cancer, there may be the need to reduce further the number of parameters in the specification of the component-covariance matrices. A further reduction can be achieved by using mixtures of factor analyzers with common component-factor loadings (MCFA), which is a more parsimonious model. However, this approach is sensitive to both non-normality and outliers, which are commonly observed in microarray experiments. This sensitivity of the MCFA approach is due to its being based on a mixture model in which the multivariate normal family of distributions is assumed for the component-error and factor distributions. Results: An extension to mixtures of t-factor analyzers with common component-factor loadings is considered, whereby the multivariate t-family is adopted for the component-error and factor distributions. An EM algorithm is developed for the fitting of mixtures of common t-factor analyzers. The model can handle data with tails longer than that of the normal distribution, is robust against outliers and allows the data to be displayed in low-dimensional plots. It is applied here to both synthetic data and some microarray gene expression data for clustering and shows its better performance over several existing methods. Availability: The algorithms were implemented in Matlab. The Matlab code is available at http://blog.naver.com/aggie100. Contact: jbaek@jnu.ac.kr Supplementary information: Supplementary data are available at Bioinformatics online.
Keywords
Affiliated Institutions
Related Publications
Mixtures of Factor Analyzers with Common Factor Loadings: Applications to the Clustering and Visualization of High-Dimensional Data
Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data, where the number of observations n is not very large relative to t...
Robust Improper Maximum Likelihood: Tuning, Computation, and a Comparison With Other Methods for Robust Gaussian Clustering
The two main topics of this paper are the introduction of the "optimally\ntuned improper maximum likelihood estimator" (OTRIMLE) for robust clustering\nbased on the multivariate...
A mixture of generalized hyperbolic distributions
Abstract We introduce a mixture of generalized hyperbolic distributions as an alternative to the ubiquitous mixture of Gaussian distributions as well as their near relatives wit...
Introduction to Multivariate Analysis
Part One. Multivariate distributions. Preliminary data analysis. Part Two: Finding new underlying variables. Principal component analysis. Factor analysis. Part Three: Procedure...
Combining Mixture Components for Clustering
Model-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used....
Publication Info
- Year
- 2011
- Type
- article
- Volume
- 27
- Issue
- 9
- Pages
- 1269-1276
- Citations
- 73
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/btr112