Abstract
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domain{specific synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing (LSI) by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and defines a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching methods as well as over LSI. In particular, the combination of models with different dimensionalities has proven to be advantageous.
Keywords
Affiliated Institutions
Related Publications
Using Linear Algebra for Intelligent Information Retrieval
Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users’ requests and those in or assigned to docum...
Modeling scenes with local descriptors and latent aspects
We present a new approach to model visual scenes in image collections, based on local invariant features and probabilistic latent space models. Our formulation provides answers ...
Statistical approach to X-ray CT imaging and its applications in image analysis. II. A new stochastic model-based image segmentation technique for X-ray CT image
For pt.I, see ibid., vol.11, no.1, p.53.61 (1992). Based on the statistical properties of X-ray CT imaging given in pt.I, an unsupervised stochastic model-based image segmentati...
A Bayesian Information Criterion for Singular Models
Summary We consider approximate Bayesian model choice for model selection problems that involve models whose Fisher information matrices may fail to be invertible along other co...
Object class recognition by unsupervised scale-invariant learning
We present a method to learn and recognize object class models from unlabeled and unsegmented cluttered scenes in a scale invariant manner. Objects are modeled as flexible const...
Publication Info
- Year
- 2017
- Type
- article
- Volume
- 51
- Issue
- 2
- Pages
- 211-218
- Citations
- 4048
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1145/3130348.3130370