Mixtures of common <i>t</i>-factor analyzers for clustering high-dimensional microarray data

Jangsun Baek; Geoffrey J. McLachlan

doi:10.1093/bioinformatics/btr112

Abstract

Abstract Motivation: Mixtures of factor analyzers enable model-based clustering to be undertaken for high-dimensional microarray data, where the number of observations n is small relative to the number of genes p. Moreover, when the number of clusters is not small, for example, where there are several different types of cancer, there may be the need to reduce further the number of parameters in the specification of the component-covariance matrices. A further reduction can be achieved by using mixtures of factor analyzers with common component-factor loadings (MCFA), which is a more parsimonious model. However, this approach is sensitive to both non-normality and outliers, which are commonly observed in microarray experiments. This sensitivity of the MCFA approach is due to its being based on a mixture model in which the multivariate normal family of distributions is assumed for the component-error and factor distributions. Results: An extension to mixtures of t-factor analyzers with common component-factor loadings is considered, whereby the multivariate t-family is adopted for the component-error and factor distributions. An EM algorithm is developed for the fitting of mixtures of common t-factor analyzers. The model can handle data with tails longer than that of the normal distribution, is robust against outliers and allows the data to be displayed in low-dimensional plots. It is applied here to both synthetic data and some microarray gene expression data for clustering and shows its better performance over several existing methods. Availability: The algorithms were implemented in Matlab. The Matlab code is available at http://blog.naver.com/aggie100. Contact: jbaek@jnu.ac.kr Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

Cluster analysisOutlierComputer sciencePrincipal component analysisData miningMultivariate statisticsComponent (thermodynamics)MATLABMultivariate normal distributionCovariance matrixStatisticsAlgorithmMathematicsArtificial intelligence

Affiliated Institutions

Related Publications

Mixtures of Factor Analyzers with Common Factor Loadings: Applications to the Clustering and Visualization of High-Dimensional Data

Jangsun Baek , Geoffrey J. McLachlan , L.K. Flack

Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data, where the number of observations n is not very large relative to t...

2009 IEEE Transactions on Pattern Analysis... 159 citations

Robust Improper Maximum Likelihood: Tuning, Computation, and a Comparison With Other Methods for Robust Gaussian Clustering

Pietro Coretto , Christian Hennig

The two main topics of this paper are the introduction of the "optimally\ntuned improper maximum likelihood estimator" (OTRIMLE) for robust clustering\nbased on the multivariate...

2016 Journal of the American Statistical A... 69 citations

A mixture of generalized hyperbolic distributions

Ryan P. Browne , Paul D. McNicholas

Abstract We introduce a mixture of generalized hyperbolic distributions as an alternative to the ubiquitous mixture of Gaussian distributions as well as their near relatives wit...

2015 Canadian Journal of Statistics 180 citations

Introduction to Multivariate Analysis

Jack C. Lee , C. Chatfield , Alexander J. Collins

Part One. Multivariate distributions. Preliminary data analysis. Part Two: Finding new underlying variables. Principal component analysis. Factor analysis. Part Three: Procedure...

1983 Technometrics 1788 citations

Combining Mixture Components for Clustering

Jean-Patrick Baudry , Adrian E. Raftery , Gilles Celeux +2 more

Model-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used....

2010 Journal of Computational and Graphica... 332 citations

Publication Info

Year: 2011
Type: article
Volume: 27
Issue: 9
Pages: 1269-1276
Citations: 73
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Mixtures of common <i>t</i>-factor analyzers for clustering high-dimensional microarray data

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Jangsun Baek, 
                                
                                    Geoffrey J. McLachlan
                                
                            (2011). 
                            Mixtures of common <i>t</i>-factor analyzers for clustering high-dimensional microarray data. 
                            Bioinformatics
                            , 27
                            (9)
                            , 1269-1276.
                            https://doi.org/10.1093/bioinformatics/btr112

Identifiers

DOI: 10.1093/bioinformatics/btr112