Abstract
Given the well‐known limitations of the Turing test, there is a need for objective tests to both focus attention on, and measure progress toward, the goals of AI. In this paper we argue that machine performance on standardized tests should be a key component of any new measure of AI, because attaining a high level of performance requires solving significant AI problems involving language understanding and world modeling — critical skills for any machine that lays claim to intelligence. In addition, standardized tests have all the basic requirements of a practical test: they are accessible, easily comprehensible, clearly measurable, and offer a graduated progression from simple tasks to those requiring deep understanding of the world. Here we propose this task as a challenge problem for the community, summarize our state‐of‐the‐art results on math and science tests, and provide supporting data sets ( www.allenai.org ).
Keywords
Affiliated Institutions
Related Publications
Machines and Thought
Abstract This is the first of two volumes of essays in commemoration of Alan Turing, whose pioneering work in the theory of artificial intelligence and computer science continue...
GPT-3: Its Nature, Scope, Limits, and Consequences
Abstract In this commentary, we discuss the nature of reversible and irreversible questions, that is, questions that may enable one to identify the nature of the source of their...
Computing Machinery and Intelligence (1950)
Abstract Together with ‘On Computable Numbers’, ‘Computing Machinery and Intelligence’ forms Turing’s best-known work. This elegant and sometimes amusing essay was originally pu...
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Human ability to understand language is general, flexible, and robust. In contrast, most NLU models above the word level are designed for a specific task and struggle with out-o...
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Human ability to understand language is general, flexible, and robust. In contrast, most NLU models above the word level are designed for a specific task and struggle with out-o...
Publication Info
- Year
- 2016
- Type
- article
- Volume
- 37
- Issue
- 1
- Pages
- 5-12
- Citations
- 132
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1609/aimag.v37i1.2636