
UTSW researchers find large language models could augment cumbersome human workflows for patient registries, streamline data collection
Using only text from doctors’ notes and radiology reports, an artificial intelligence (AI) program known as GPT-4o reliably identified patients’ types of strokes, UT Southwestern Medical Center researchers found. Their study, published in Stroke, could eventually lead to new ways to help guide doctors’ medical decisions in real time and reduce the heavy workload necessary to report data to patient registries.

Ann Marie Navar, M.D., Ph.D., is Associate Professor of Internal Medicine and in the Peter O’Donnell Jr. School of Public Health at UT Southwestern.
“Large language models (LLMs) that can decipher unstructured text are an emerging AI technology with immense potential in medical research. Our study provides proof that these LLMs can abstract medical diagnoses from medical notes as well as human chart abstractors,” said Ann Marie Navar, M.D., Ph.D., Associate Professor of Internal Medicine and in the Peter O’Donnell Jr. School of Public Health at UT Southwestern.
Dr. Navar co-led the study with Eric Peterson, M.D., M.P.H., Professor of Internal Medicine and in the O’Donnell School of Public Health, Vice Provost, and Senior Associate Dean for Clinical Research; and Dylan Owens, Ph.D., M.S., Postdoctoral Researcher.
Like most large academic medical centers, UTSW participates in several patient registries – systematic collections of data on specific conditions that researchers use for studies. One prominent example is the American Heart Association’s Get With The Guidelines-Stroke (GWTG-Stroke), a quality improvement initiative involving over 2,600 hospitals across the country. When patients are treated at one of these hospitals for stroke, trained nurses collect a wealth of information from their electronic health records, inputting the data into lengthy forms. This process requires an enormous amount of human labor.

Eric Peterson, M.D., M.P.H., is Professor of Internal Medicine and in the O’Donnell School of Public Health, Vice Provost, and Senior Associate Dean for Clinical Research at UT Southwestern. He holds the Adelyn and Edmund M. Hoffman Distinguished Chair in Medical Science.
To decrease this burden, Drs. Owens, Navar, and Peterson wondered whether LLMs – a form of AI designed to understand and generate human language – could be used for the same purpose. They started with a simple question: Could an LLM accurately determine stroke type based only on “unstructured” data found in electronic health records, such as notes and reports?
The researchers tested this idea with GPT-4o, an LLM introduced this year with capabilities beyond the more commonly used ChatGPT. Using electronic health records for 4,123 patients hospitalized for stroke at UT Southwestern and Parkland Health between January 2019 and August 2023, the team evaluated three types of prompts asking the LLM to distinguish each patient’s stroke type. Zero-shot chain-of-thought prompts encouraged the model to break complex queries into smaller, logical steps using minimal human input; expert-guided prompts incorporated tips from neurologists and cardiologists; and instruction-based prompts steered the model to evaluate patients’ records using GWTG-Stroke registry guidelines.
The researchers compared the results they received from GPT-4o with those recorded in registry reports for these patients in GWTG-Stroke. They found that all three LLM prompt styles accurately distinguished between the two major types of stroke – hemorrhagic and ischemic – and between hemorrhagic subtypes. However, accuracy was lower for some ischemic subtypes, such as cryptogenic strokes. This lower reliability reflects real-world difficulty in classifying these subtypes, which tend to be diagnoses of exclusion, Dr. Owens explained.

Dylan Owens, Ph.D., M.S., is a Postdoctoral Researcher at UT Southwestern.
Together, he said, the results suggest LLMs could be a useful tool for accurately abstracting some information from electronic health records for populating time-intensive registry forms and could be used to flag other data that need a closer look from human abstractors. Future research will focus on using LLMs to fill in other parts of registry forms, as well as the feasibility of using LLMs for clinical decision support – programs that aim to improve patient outcomes by delivering timely information to providers at the point of care.
Dr. Owens noted that UTSW researchers also have achieved success working with LLMs for other tasks such as matching patients with clinical trials, performing quality assessments while investigating opportunities for population health improvement, and automating extraction of clinical data for research.
Additional UTSW researchers who contributed to this study are Justin Rousseau, M.D., M.M.Sc., Associate Professor of Neurology and in the Peter O’Donnell Jr. Brain Institute and Deputy Chief Medical Informatics Officer for Neurosciences; Michael Dohopolski, M.D., Assistant Professor of Radiation Oncology and a member of the Harold C. Simmons Comprehensive Cancer Center; and Danh Q. Nguyen, M.D., Clinical Fellow.
Dr. Peterson holds the Adelyn and Edmund M. Hoffman Distinguished Chair in Medical Science.
This study was funded by UT Southwestern Medical Center and grants from the National Institutes of Health (5T32HL12524710 and UL11R003163).