Abstract

This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of web content. The hierarchical structure is initially used to train different second-level classifiers. In the hierarchical case, a model is learned to distinguish a second-level category from other categories within the same top level. In the flat non-hierarchical case, a model distinguishes a second-level category from all other second-level categories. Scoring rules can further take advantage of the hierarchy by considering only second-level categories that exceed a threshold at the top level.

Keywords

HierarchyComputer scienceHierarchical database modelHierarchical control systemInformation retrievalHierarchical organizationContent (measure theory)Multilevel modelArtificial intelligenceData miningMachine learningMathematicsControl (management)

Affiliated Institutions

Related Publications

The Generalized A* Architecture

We consider the problem of computing a lightest derivation of a global structure using a set of weighted rules. A large variety of inference problems in AI can be formulated in ...

2007 Journal of Artificial Intelligence Re... 20 citations

Publication Info

Year
2000
Type
article
Pages
256-263
Citations
801
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

801
OpenAlex

Cite This

Susan Dumais, Hao Chen (2000). Hierarchical classification of Web content. , 256-263. https://doi.org/10.1145/345508.345593

Identifiers

DOI
10.1145/345508.345593