Abstract

A simple method for categorizing texts into pre-determined text genre categories using the statistical standard technique of discriminant analysis is demonstrated with application to the Brown corpus. Discriminant analysis makes it possible use a large number of parameters that may be specific for a certain corpus or information stream, and combine them into a small number of functions, with the parameters weighted on basis of how useful they are for discriminating text genres. An application to information retrieval is discussed.

Keywords

Linear discriminant analysisComputer scienceDiscriminantArtificial intelligenceSimple (philosophy)Natural language processingBasis (linear algebra)Statistical analysisPattern recognition (psychology)MathematicsStatistics

Affiliated Institutions

Related Publications

Factor Analysis and AIC

The information criterion AIC was introduced to extend the method of maximum likelihood to the multimodel situation. It was obtained by relating the successful experience of the...

1987 Psychometrika 4988 citations

Publication Info

Year
1994
Type
article
Volume
2
Pages
1071-1071
Citations
293
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

293
OpenAlex
17
Influential
99
CrossRef

Cite This

Jussi Karlgren, Douglass R. Cutting (1994). Recognizing text genres with simple metrics using discriminant analysis. Proceedings of the 15th conference on Computational linguistics - , 2 , 1071-1071. https://doi.org/10.3115/991250.991324

Identifiers

DOI
10.3115/991250.991324
arXiv
cmp-lg/9410008

Data Quality

Data completeness: 84%