A THEORETICAL BASIS FOR THE USE OF CO‐OCCURRENCE DATA IN INFORMATION RETRIEVAL

Abstract

Abstract This paper provides a foundation for a practical way of improving the effectiveness of an automatic retrieval system. Its main concern is with the weighting of index terms as a device for increasing retrieval effectiveness. Previously index terms have been assumed to be independent for the good reason that then a very simple weighting scheme can be used. In reality index terms are most unlikely to be independent. This paper explores one way of removing the independence assumption. Instead the extent of the dependence between index terms is measured and used to construct a non‐linear weighting function. In a practical situation the values of some of the parameters of such a function must be estimated from small samples of documents. So a number of estimation rules are discussed and one in particular is recommended. Finally the feasibility of the computations required for a non‐linear weighting scheme is examined.

Keywords

WeightingComputer scienceIndependence (probability theory)Index (typography)Data miningConstruct (python library)Function (biology)ComputationBasis (linear algebra)A-weightingInformation retrievalScheme (mathematics)Aggregate (composite)AlgorithmMathematicsStatistics

Affiliated Institutions

University of Cambridge GB

Related Publications

The Relationship between Global and Local Changes in PET Scans

Karl Friston , Chris Frith , Peter F. Liddle +3 more

In order to localize cerebral cognitive or sensorimotor function, activation paradigms are being used in conjunction with PET measures of cerebral activity (e.g., rCBF). The cha...

1990 Journal of Cerebral Blood Flow & Meta... 861 citations

Data to knowledge: how to get meaning from your result

Helen M. Berman , Margaret Gabanyi , Colin R. Groom +8 more

Structural and functional studies require the development of sophisticated `Big Data' technologies and software to increase the knowledge derived and ensure reproducibility of t...

2014 IUCrJ 16 citations

Candelinc: A General Approach to Multidimensional Analysis of Many-Way Arrays with Linear Constraints on Parameters

J. Douglas Carroll , Sandra Pruzansky , Joseph B. Kruskal

Very general multilinear models, called CANDELINC, and a practical least-squares fitting procedure, also called CANDELINC, are described for data consisting of a many-way array....

1980 Psychometrika 222 citations

On the Smoothing of Probability Density Functions

Peter Whittle

Summary We consider the estimation of a probability density function by linear smoothing of the observed density. A basis for estimation is obtained by assuming that the ordinat...

1958 Journal of the Royal Statistical Soci... 187 citations

The Choice of Variables in Multiple Regression

D. V. Lindley

Summary This paper is concerned with the analysis of data from a multiple regression of a single variable, y, on a set of independent variables, x 1,x 2,...,xr. It is argued tha...

1968 Journal of the Royal Statistical Soci... 234 citations

Publication Info

Year: 1977
Type: article
Volume: 33
Issue: 2
Pages: 106-119
Citations: 462
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

A THEORETICAL BASIS FOR THE USE OF CO‐OCCURRENCE DATA IN INFORMATION RETRIEVAL

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

462

OpenAlex

Cite This

APA Style

                            
                                    C. J. van Rijsbergen
                                
                            (1977). 
                            A THEORETICAL BASIS FOR THE USE OF CO‐OCCURRENCE DATA IN INFORMATION RETRIEVAL. 
                            Journal of Documentation
                            , 33
                            (2)
                            , 106-119.
                            https://doi.org/10.1108/eb026637

Identifiers

DOI: 10.1108/eb026637