Abstract

The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more e ective retrieval of Web information, and promote new uses of the Web to support knowledgebased inference and problem solving. Our approach istodevelop a trainable information extraction system that takes two inputs: an ontology de ning the classes and relations of interest, and a set of training data consisting of labeled regions of hypertext representing instances of these classes and relations. Given these inputs, the system learns to extract information from other pages and hyperlinks on the Web. This paper describes our general approach, several machine learning algorithms for this task, and promising initial results with a prototype system.

Keywords

Computer scienceHypertextHyperlinkKnowledge baseWeb navigationWeb intelligenceWorld Wide WebWeb modelingHypermediaInformation retrievalData WebInferenceInformation extractionSet (abstract data type)OntologyTask (project management)Web miningWeb pageArtificial intelligenceProgramming language

Affiliated Institutions

Related Publications

Usability Engineering

From the Publisher: Written by the author of the best-selling HyperText & HyperMedia, this book provides an excellent guide to the methods of usability engineering. Special fea...

1993 9352 citations

Publication Info

Year
1998
Type
article
Pages
509-516
Citations
675
Access
Closed

External Links

Citation Metrics

675
OpenAlex

Cite This

Mark Craven, Dan DiPasquo, Dayne Freitag et al. (1998). Learning to extract symbolic knowledge from the World Wide Web. , 509-516.