Tehran University of Medical Sciences

Science Communicator Platform

Share By
Gdpkg-Llm: Integrating Gene, Disease, and Pharmacogenomics Knowledge Graphs for Cognitive Neuroscience Using Large Language Models Publisher



Sarabadani A ; Fard KR ; Dalvand H
Authors

Source: Computer Science Published:2026


Abstract

Using the structures of large language models (LLMs) in creating knowledge graphs to understand more about the relationship between the entities of cognitive and biological sciences has become a hot research topic. Due to the great knowledge behind the curtain and the deep connections of this research, it is not possible to use the traditional approaches of machine learning and deep learning. In this study,the main goal is to create a comprehensive and integrated knowledge graph(KG) from the combination of three knowledge sources: Gene Ontology (GO), Disease Ontology (DO), and PharmKG. Large Language Models (LLMs) have been used to create this knowledge base. The main purpose of this KG is to understand the relationships between genes, diseases, and drugs. The proposed approach, GDPKG-LLM, has several key steps, including entity matching, similarity analysis, graph alignment, and using GPT-4. GDPKG-LLM was able to extract more than 16,800 nodes and 838,000 edges from these three knowledge bases and provide a rich KG. This graph provides meaningful relationships, making it a valuable resource for future research in personalized medicine and neuroscience. The reviewed evaluation criteria show the superiority of GDPKG-LLM, which strengthens the validity of this model. © 2025 Author(s).