Abstract
We propose a new method for comparing learning algorithms on multiple tasks which is based on a novel non-parametric test that we call the Poisson binomial test. The key aspect of this work is that we provide a formal definition for what is meant to have an algorithm that is better than another. Also, we are able to take into account the dependencies induced when evaluating classifiers on the same test set. Finally we make optimal use (in the Bayesian sense) of all the testing data we have. We demonstrate empirically that our approach is more reliable than the sign test and the Wilcoxon signed rank test, the current state of the art for algorithm comparisons. 1
Keywords
Affiliated Institutions
Related Publications
A win ratio approach to comparing continuous non‐normal outcomes in clinical trials
Clinical trials are often designed to compare continuous non‐normal outcomes. The conventional statistical method for such a comparison is a non‐parametric Mann–Whitney test, wh...
Spatial Sign Preprocessing: A Simple Way To Impart Moderate Robustness to Multivariate Estimators
The spatial sign is a multivariate extension of the concept of sign. Recently multivariate estimators of covariance structures based on spatial signs have been examined by vario...
Evidence Synthesis for Decision Making 2
We set out a generalized linear model framework for the synthesis of data from randomized controlled trials. A common model is described, taking the form of a linear regression ...
Testing for a Finite Mixture Model with Two Components
Summary We consider a finite mixture model with k components and a kernel distribution from a general one-parameter family. The problem of testing the hypothesis k=2 versusk⩾3 i...
Rank-density-based multiobjective genetic algorithm and benchmark test function study
Concerns the use of evolutionary algorithms (EA) in solving multiobjective optimization problems (MOP). We propose the use of a rank-density-based genetic algorithm (RDGA) that ...
Publication Info
- Year
- 2012
- Type
- article
- Pages
- 665-675
- Citations
- 29
- Access
- Closed