Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Biomedical Question Answering

Larissa Pusch; Tim Conrad

doi:10.3390/biomedinformatics5040070

Abstract

Advancements in natural language processing (NLP), particularly Large Language Models (LLMs), have greatly improved how we access knowledge. However, in critical domains like biomedicine, challenges like hallucinations—where language models generate information not grounded in data—can lead to dangerous misinformation. This paper presents a hybrid approach that combines LLMs with Knowledge Graphs (KGs) to improve the accuracy and reliability of question-answering systems in the biomedical field. Our method, implemented using the LangChain framework, includes a query-checking algorithm that checks and, where possible, corrects LLM-generated Cypher queries, which are then executed on the Knowledge Graph, grounding answers in the KG and reducing hallucinations in the evaluated cases. We evaluated several LLMs, including several GPT models and Llama 3.3:70b, on a custom benchmark dataset of 50 biomedical questions. GPT-4 Turbo achieved 90% query accuracy, outperforming most other models. We also evaluated prompt engineering, but found little statistically significant improvement compared to the standard prompt, except for Llama 3:70b, which improved with few-shot prompting. To enhance usability, we developed a web-based interface that allows users to input natural language queries, view generated and corrected Cypher queries, and inspect results for accuracy. This framework improves reliability and accessibility by accepting natural language questions and returning verifiable answers directly from the knowledge graph, enabling inspection and reproducibility. The source code for generating the results of this paper and for the user-interface can be found in our Git repository: https://git.zib.de/lpusch/cyphergenkg-gui, accessed on 1 November 2025.

Affiliated Institutions

Zuse Institute Berlin DE

Related Publications

Large language models encode clinical knowledge

Karan Singhal , Shekoofeh Azizi , Tao Tu +29 more

Abstract Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of mode...

2023 Nature 2248 citations

Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations (Short Paper)

T. B. Brown , Benjamin F. Mann , Nick Ryder +28 more

This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations. We utilize LLMs including GPT-2 and BERT t...

2023 Leibniz-Zentrum für Informatik (Schlo... 14006 citations

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment

Aidan Gilson , Conrad Safranek , Thomas Huang +4 more

Background Chat Generative Pre-trained Transformer (ChatGPT) is a 175-billion-parameter natural language processing model that can generate conversation-style responses to user ...

2023 JMIR Medical Education 1794 citations

A learning theory approach to noninteractive database privacy

Avrim Blum , Katrina Ligett , Aaron Roth

In this article, we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of...

2013 Journal of the ACM 255 citations

Natural Questions: A Benchmark for Question Answering Research

Tom Kwiatkowski , Jennimaria Palomaki , Olivia Redfield +15 more

We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator...

2019 Transactions of the Association for C... 1830 citations

Publication Info

Year: 2025
Type: article
Volume: 5
Issue: 4
Pages: 70-70
Citations: 0
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Biomedical Question Answering

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Larissa Pusch, 
                                
                                    Tim Conrad
                                
                            (2025). 
                            Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Biomedical Question Answering. 
                            BioMedInformatics
                            , 5
                            (4)
                            , 70-70.
                            https://doi.org/10.3390/biomedinformatics5040070

Identifiers

DOI: 10.3390/biomedinformatics5040070