Abstract

Given the well‐known limitations of the Turing test, there is a need for objective tests to both focus attention on, and measure progress toward, the goals of AI. In this paper we argue that machine performance on standardized tests should be a key component of any new measure of AI, because attaining a high level of performance requires solving significant AI problems involving language understanding and world modeling — critical skills for any machine that lays claim to intelligence. In addition, standardized tests have all the basic requirements of a practical test: they are accessible, easily comprehensible, clearly measurable, and offer a graduated progression from simple tasks to those requiring deep understanding of the world. Here we propose this task as a challenge problem for the community, summarize our state‐of‐the‐art results on math and science tests, and provide supporting data sets ( www.allenai.org ).

Keywords

Measure (data warehouse)Computer scienceTask (project management)Turing testStandardized testArtificial intelligenceTest (biology)Key (lock)Focus (optics)Simple (philosophy)TuringData scienceMachine learningMathematics educationProgramming languageData miningPsychologyEngineeringSystems engineeringComputer security

Affiliated Institutions

Related Publications

Machines and Thought

Abstract This is the first of two volumes of essays in commemoration of Alan Turing, whose pioneering work in the theory of artificial intelligence and computer science continue...

1996 59 citations

Publication Info

Year
2016
Type
article
Volume
37
Issue
1
Pages
5-12
Citations
132
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

132
OpenAlex
4
Influential
27
CrossRef

Cite This

Peter E. Clark, Oren Etzioni (2016). My Computer Is an Honor Student — But How Intelligent Is It? Standardized Tests as a Measure of AI. AI Magazine , 37 (1) , 5-12. https://doi.org/10.1609/aimag.v37i1.2636

Identifiers

DOI
10.1609/aimag.v37i1.2636

Data Quality

Data completeness: 81%