A survey of smoothing techniques for ME models

S.F. Chen; Roni Rosenfeld

doi:10.1109/89.817452

Abstract

In certain contexts, maximum entropy (ME) modeling can be viewed as maximum likelihood (ML) training for exponential models, and like other ML methods is prone to overfitting of training data. Several smoothing methods for ME models have been proposed to address this problem, but previous results do not make it clear how these smoothing methods compare with smoothing methods for other types of related models. In this work, we survey previous work in ME smoothing and compare the performance of several of these algorithms with conventional techniques for smoothing n-gram language models. Because of the mature body of research in n-gram model smoothing and the close connection between ME and conventional n-gram models, this domain is well-suited to gauge the performance of ME smoothing methods. Over a large number of data sets, we find that fuzzy ME smoothing performs as well as or better than all other algorithms under consideration. We contrast this method with previous n-gram smoothing methods to explain its superior performance.

Keywords

SmoothingOverfittingComputer scienceExponential smoothingEntropy (arrow of time)n-gramArtificial intelligenceMachine learningAlgorithmLanguage modelArtificial neural network

Affiliated Institutions

Related Publications

A Scalable Hierarchical Distributed Language Model

Andriy Mnih , Geoffrey E. Hinton

Neural probabilistic language models (NPLMs) have been shown to be competitive with and occasionally superior to the widely-used n-gram language models. The main drawback of NPL...

2008 848 citations

Using Maximum Entropy for Text Classification

Kamal Nigam , John Lafferty , Andrew McCallum

This paper proposes the use of maximum entropy techniques for text classification. Maximum entropy is a probability distribution estimation technique widely used for a variety o...

1999 756 citations

Enriching Word Vectors with Subword Information

Piotr Bojanowski , Édouard Grave , Armand Joulin +1 more

Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore ...

2017 Transactions of the Association for C... 9444 citations

N-gram Counts and Language Models from the Common Crawl

Christian Buck , Kenneth Heafield , Bas van Ooyen

We contribute 5-gram counts and language models trained on the Common Crawl corpus, a collection over 9 billion web pages. This release improves upon the Google n-gram counts in...

2014 145 citations

An architecture for parallel topic models

Alexander J. Smola , Shravan Narayanamurthy

This paper describes a high performance sampling architecture for inference of latent topic models on a cluster of workstations. Our system is faster than previous work by over ...

2010 Proceedings of the VLDB Endowment 424 citations

Publication Info

Year: 2000
Type: article
Volume: 8
Issue: 1
Pages: 37-50
Citations: 199
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

A survey of smoothing techniques for ME models

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

199

OpenAlex

Cite This

APA Style

                            
                                    S.F. Chen, 
                                
                                    Roni Rosenfeld
                                
                            (2000). 
                            A survey of smoothing techniques for ME models. 
                            IEEE Transactions on Speech and Audio Processing
                            , 8
                            (1)
                            , 37-50.
                            https://doi.org/10.1109/89.817452

Identifiers

DOI: 10.1109/89.817452