Abstract

Abstract : As a result of this grant, the researchers have now published oil CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, with over 3 million words of that material assigned skeletal grammatical structure. This material now includes a fully hand-parsed version of the classic Brown corpus. About one half of the papers at the ACL Workshop on Using Large Text Corpora this past summer were based on the materials generated by this grant.

Keywords

TreebankComputer scienceNatural language processingCorpus linguisticsArtificial intelligenceLinguisticsParsingPhilosophy

Affiliated Institutions

Related Publications

Publication Info

Year
1993
Type
report
Citations
7487
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

7487
OpenAlex

Cite This

Mitchell P. Marcus, Mary Ann Marcinkiewicz, Beatrice Santorini (1993). Building a Large Annotated Corpus of English: The Penn Treebank. . https://doi.org/10.21236/ada273556

Identifiers

DOI
10.21236/ada273556