Probabilistic Latent Semantic Indexing

Abstract

Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domain{specific synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing (LSI) by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and defines a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching methods as well as over LSI. In particular, the combination of models with different dimensionalities has proven to be advantageous.

Keywords

Computer scienceProbabilistic latent semantic analysisProbabilistic logicSearch engine indexingArtificial intelligenceGeneralizationMathematics

Affiliated Institutions

International Computer Science Institute US

Related Publications

Using Linear Algebra for Intelligent Information Retrieval

Michael W. Berry , Susan Dumais , Gavin O'Brien

Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users’ requests and those in or assigned to docum...

1995 SIAM Review 1482 citations

Modeling scenes with local descriptors and latent aspects

Pedro Quelhas , Florent Monay , Jean‐Marc Odobez +3 more

We present a new approach to model visual scenes in image collections, based on local invariant features and probabilistic latent space models. Our formulation provides answers ...

2005 345 citations

Statistical approach to X-ray CT imaging and its applications in image analysis. II. A new stochastic model-based image segmentation technique for X-ray CT image

T. Lei , Wilfred Sewchand

For pt.I, see ibid., vol.11, no.1, p.53.61 (1992). Based on the statistical properties of X-ray CT imaging given in pt.I, an unsupervised stochastic model-based image segmentati...

1992 IEEE Transactions on Medical Imaging 91 citations

A Bayesian Information Criterion for Singular Models

Mathias Drton , Martyn Plummer

Summary We consider approximate Bayesian model choice for model selection problems that involve models whose Fisher information matrices may fail to be invertible along other co...

2017 Journal of the Royal Statistical Soci... 120 citations

Object class recognition by unsupervised scale-invariant learning

Rob Fergus , Pietro Perona , Andrew Zisserman

We present a method to learn and recognize object class models from unlabeled and unsegmented cluttered scenes in a scale invariant manner. Objects are modeled as flexible const...

2003 2035 citations

Publication Info

Year: 2017
Type: article
Volume: 51
Issue: 2
Pages: 211-218
Citations: 4048
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Probabilistic Latent Semantic Indexing

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

4048

OpenAlex

Cite This

APA Style

                            
                                    Thomas Hofmann
                                
                            (2017). 
                            Probabilistic Latent Semantic Indexing. 
                            ACM SIGIR Forum
                            , 51
                            (2)
                            , 211-218.
                            https://doi.org/10.1145/3130348.3130370

Identifiers

DOI: 10.1145/3130348.3130370