Abstract
Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users’ requests and those in or assigned to documents in a database. Because of the tremendous diversity in the words people use to describe the same document, lexical methods are necessarily incomplete and imprecise. Using the singular value decomposition (SVD), one can take advantage of the implicit higher-order structure in the association of terms with documents by determining the SVD of large sparse term by document matrices. Terms and documents represented by 200–300 of the largest singular vectors are then matched against user queries. We call this retrieval method latent semantic indexing (LSI) because the subspace represents important associative relationships between terms and documents that are not evident in individual documents. LSI is a completely automatic yet intelligent indexing method, widely applicable, and a promising way to improve users’ access to many kinds of textual materials, or to documents and services for which textual descriptions are available. A survey of the computational requirements for managing LSI-encoded databases as well as current and future applications of LSI is presented.
Keywords
Related Publications
Principal component analysis
Abstract Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter‐correlated quantitative d...
Similarity Search in High Dimensions via Hashing
The nearest- or near-neighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasin...
Introduction to Information Retrieval
Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering f...
Chemical Markup, XML, and the Worldwide Web. 1. Basic Principles
Chemical markup language (CML) is an application of XML, the extensible markup language, developed for containing chemical information components within documents. Its design su...
Contributions of Latin American researchers in the understanding of the novel coronavirus outbreak: a literature review
This article aimed to give the visibility of Latin American researchers’ contributions to the comprehension of COVID-19; our method was a literature review. Currently, the world...
Publication Info
- Year
- 1995
- Type
- article
- Volume
- 37
- Issue
- 4
- Pages
- 573-595
- Citations
- 1482
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1137/1037127