Abstract

We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making.

Keywords

ConcordanceUnited States Medical Licensing ExaminationMedical educationComputer scienceLicensureMedical schoolArtificial intelligenceMedicineInternal medicine

Affiliated Institutions

Related Publications

Publication Info

Year
2023
Type
article
Volume
2
Issue
2
Pages
e0000198-e0000198
Citations
3039
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

3039
OpenAlex

Cite This

Tiffany H. Kung, Morgan Cheatham, Arielle Medenilla et al. (2023). Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digital Health , 2 (2) , e0000198-e0000198. https://doi.org/10.1371/journal.pdig.0000198

Identifiers

DOI
10.1371/journal.pdig.0000198