Abstract

Abstract Motivation: Elucidating the molecular taxonomy of cancers and finding biological and clinical markers from microarray experiments is problematic due to the large number of variables being measured. Feature selection methods that can identify relevant classifiers or that can remove likely false positives prior to supervised analysis are therefore desirable. Results: We present a novel feature selection procedure based on a mixture model and a non-gaussianity measure of a gene's expression profile. The method can be used to find genes that define either small outlier subgroups or major subdivisions, depending on the sign of kurtosis. The method can also be used as a filtering step, prior to supervised analysis, in order to reduce the false discovery rate. We validate our methodology using six independent datasets by rediscovering major classifiers in ER negative and ER positive breast cancer and in prostate cancer. Furthermore, our method finds two novel subtypes within the basal subgroup of ER negative breast tumours, associated with apoptotic and immune response functions respectively, and with statistically different clinical outcome. Availability: An R-function pack that implements the methods used here has been added to vabayelMix, available from (). Contact: aet21@cam.ac.uk Supplementary information: Supplementary information is available at Bioinformatics online.

Keywords

Feature selectionFalse discovery rateComputer sciencePattern recognition (psychology)KurtosisOutlierFalse positive paradoxArtificial intelligenceCluster analysisBreast cancerData miningComputational biologyCancerMathematicsBiologyStatisticsGene

MeSH Terms

AlgorithmsArtificial IntelligenceBiomarkersTumorBreast NeoplasmsCluster AnalysisDiagnosisComputer-AssistedFemaleGene Expression ProfilingHumansMaleNeoplasm ProteinsOligonucleotide Array Sequence AnalysisPattern RecognitionAutomatedProstatic NeoplasmsSoftware

Affiliated Institutions

Related Publications

Publication Info

Year
2006
Type
article
Volume
22
Issue
18
Pages
2269-2275
Citations
65
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

65
OpenAlex
5
Influential
56
CrossRef

Cite This

Andrew E. Teschendorff, Ali Naderi, Nuno L. Barbosa‐Morais et al. (2006). PACK: Profile Analysis using Clustering and Kurtosis to find molecular classifiers in cancer. Bioinformatics , 22 (18) , 2269-2275. https://doi.org/10.1093/bioinformatics/btl174

Identifiers

DOI
10.1093/bioinformatics/btl174
PMID
16682424

Data Quality

Data completeness: 86%