Abstract

Categorization of documents is challenging, as the number of discriminating words can be very large.We present a nearest neighbor classification scheme for text categorization in which the importance of discriminating words is learned using mutual information and weight adjustment techniques.The nearest neighbors for a particular document are then computed based on the matching words and their weights.We evaluate our scheme on both synthetic and real world documents.Our experiments with synthetic data sets show that this scheme is robust under different emulated conditions.Empirical results on real world documents demonstrate that this scheme outperforms state of the art classification algorithms such as C4.5, RIPPER, Rainbow, and PEBLS.

Keywords

Categorizationk-nearest neighbors algorithmText categorizationPattern recognition (psychology)Artificial intelligenceComputer scienceMathematics

Affiliated Institutions

Related Publications

Publication Info

Year
1999
Type
report
Citations
82
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

82
OpenAlex

Cite This

Eui-Hong Han, George Karypis, Vipin Kumar (1999). Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. . https://doi.org/10.21236/ada439688

Identifiers

DOI
10.21236/ada439688