Feature Selection and Dualities in Maximum Entropy Discrimination

Abstract

Incorporating feature selection into a classification or regression method often carries a number of advantages. In this paper we formalize feature selection specifically from a discriminative perspective of improving classification/regression accuracy. The feature selection method is developed as an extension to the recently proposed maximum entropy discrimination (MED) framework. We describe MED as a flexible (Bayesian) regularization approach that subsumes, e.g., support vector classification, regression and exponential family models. For brevity, we restrict ourselves primarily to feature selection in the context of linear classification/regression methods and demonstrate that the proposed approach indeed carries substantial improvements in practice. Moreover, we discuss and develop various extensions of feature selection, including the problem of dealing with example specific but unobserved degrees of freedom -- alignments or invariants.

Keywords

Feature selectionDiscriminative modelArtificial intelligenceComputer sciencePrinciple of maximum entropyPattern recognition (psychology)Machine learningEntropy (arrow of time)RegressionFeature (linguistics)Support vector machineMathematicsStatistics

Affiliated Institutions

Massachusetts Institute of Technology US

Related Publications

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

Hanchuan Peng , Fuhui Long , Chen Ding

Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion base...

2005 IEEE Transactions on Pattern Analysis... 10050 citations

Feature selection: evaluation, application, and small sample performance

Anil K. Jain , Douglas E. Zongker

A large number of algorithms have been proposed for feature subset selection. Our experimental results show that the sequential forward floating selection algorithm, proposed by...

1997 IEEE Transactions on Pattern Analysis... 2147 citations

Feature selection for high-dimensional genomic microarray data

Eric P. Xing , Michael I. Jordan , Richard M. Karp

We report on the successful application of feature selection methods to a classification problem in molecular biology involving only 72 data points in a 7130 dimensional space. ...

2001 628 citations

Feature selection for multiclass discrimination via mixed-integer linear programming

Frank J. Iannarilli , Paul A. Rubin

We reformulate branch-and-bound feature selection employing L/sub /spl infin// or particular L/sub p/ metrics, as mixed-integer linear programming (MILP) problems, affording con...

2003 IEEE Transactions on Pattern Analysis... 38 citations

Kernel Logistic Regression and the Import Vector Machine

Ji Zhu , Trevor Hastie

The support vector machine (SVM) is known for its good performance in two-class classification, but its extension to multiclass classification is still an ongoing research issue...

2001 136 citations

Publication Info

Year: 2013
Type: article
Pages: 291-300
Citations: 76
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Feature Selection and Dualities in Maximum Entropy Discrimination

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Tony Jebara, 
                                
                                    Tommi Jaakkola
                                
                            (2013). 
                            Feature Selection and Dualities in Maximum Entropy Discrimination. 
                            arXiv (Cornell University)
                            
                            , 291-300.
                            https://doi.org/10.48550/arxiv.1301.3865

Identifiers

DOI: 10.48550/arxiv.1301.3865