Abstract
Abstract Motivation: Clustering of individuals into populations on the basis of multilocus genotypes is informative in a variety of settings. In population-genetic clustering algorithms, such as BAPS, STRUCTURE and TESS, individual multilocus genotypes are partitioned over a set of clusters, often using unsupervised approaches that involve stochastic simulation. As a result, replicate cluster analyses of the same data may produce several distinct solutions for estimated cluster membership coefficients, even though the same initial conditions were used. Major differences among clustering solutions have two main sources: (1) ‘label switching’ of clusters across replicates, caused by the arbitrary way in which clusters in an unsupervised analysis are labeled, and (2) ‘genuine multimodality,’ truly distinct solutions across replicates. Results: To facilitate the interpretation of population-genetic clustering results, we describe three algorithms for aligning multiple replicate analyses of the same data set. We have implemented these algorithms in the computer program CLUMPP (CLUster Matching and Permutation Program). We illustrate the use of CLUMPP by aligning the cluster membership coefficients from 100 replicate cluster analyses of 600 chickens from 20 different breeds. Availability: CLUMPP is freely available at http://rosenberglab.bioinformatics.med.umich.edu/clumpp.html Contact: mjakob@umich.edu
Keywords
Affiliated Institutions
Related Publications
Bayesian Clustering Using Hidden Markov Random Fields in Spatial Population Genetics
Abstract We introduce a new Bayesian clustering algorithm for studying population structure using individually geo-referenced multilocus data sets. The algorithm is based on the...
Unsupervised K-Means Clustering Algorithm
The k-means algorithm is generally the most known and used clustering method. There are various extensions of k-means to be proposed in the literature. Although it is an unsuper...
A Bayesian approach to the identification of panmictic populations and the assignment of individuals
We present likelihood-based methods for assigning the individuals in a sample to source populations, on the basis of their genotypes at co-dominant marker loci. The source popul...
Software for Population Genetic Analyses of Molecular Marker Data
Molecular genetic markers can be used to examine a group of individuals or populations to estimate various diversity measures and genetic distances, infer population structure a...
An Examination of Procedures for Determining the Number of Clusters in a Data Set
A Monte Carlo evaluation of 30 procedures for determining the number of clusters was conducted on artificial data sets which contained either 2, 3, 4, or 5 distinct nonoverlappi...
Publication Info
- Year
- 2007
- Type
- article
- Volume
- 23
- Issue
- 14
- Pages
- 1801-1806
- Citations
- 6282
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/btm233