Development and Validation of Natural Language Processing Algorithms in the ENACT National Electronic Health Record Research Network.

TitleDevelopment and Validation of Natural Language Processing Algorithms in the ENACT National Electronic Health Record Research Network.
Publication TypeJournal Article
Year of Publication2025
AuthorsWang Y, Hilsman J, Li C, Morris M, Heider PM, Fu S, Kwak MJi, Wen A, Applegate JR, Wang L, Bernstam E, Liu H, Chang J, Harris DR, Corbeau A, Henderson D, Osborne JD, Kennedy RE, Garduno-Rapp N-E, Rousseau JF, Yan C, Chen Y, Patel MB, Murphy TJ, Malin BA, Park CMi, Fan JW, Sohn S, Pagali S, Peng Y, Pathak A, Wu Y, Xia Z, Loguercio S, Reis SE, Visweswaran S
JournalmedRxiv
Date Published2025 Jan 27
Abstract

Electronic health record (EHR) data are a rich and invaluable source of real-world clinical information, enabling detailed insights into patient populations, treatment outcomes, and healthcare practices. The availability of large volumes of EHR data are critical for advancing translational research and developing innovative technologies such as artificial intelligence. The Evolve to Next-Gen Accrual to Clinical Trials (ENACT) network, established in 2015 with funding from the National Center for Advancing Translational Sciences (NCATS), aims to accelerate translational research by democratizing access to EHR data for all Clinical and Translational Science Awards (CTSA) hub investigators. The present ENACT network provides access to structured EHR data, enabling cohort discovery and translational research across the network. However, a substantial amount of critical information is contained in clinical narratives, and natural language processing (NLP) is required for extracting this information to support research. To address this need, the ENACT NLP Working Group was formed to make NLP-derived clinical information accessible and queryable across the network. This article describes the implementation and deployment of NLP infrastructure across ENACT. First, we describe the formation and goals of the Working Group, the practices and logistics involved in implementation and deployment, and the specific NLP tools and technologies utilized. Then, we describe how we extended the ENACT ontology to standardize and query NLP-derived data, as well as how we conducted multisite evaluations of the NLP algorithms. Finally, we reflect on the experience and lessons learnt, which may be useful for other national data networks that are deploying NLP to unlock the potential of clinical text for research.

DOI10.1101/2025.01.24.25321096
Alternate JournalmedRxiv
PubMed ID39974073
PubMed Central IDPMC11839006
Grant ListUL1 TR001857 / TR / NCATS NIH HHS / United States
UM1 TR004407 / TR / NCATS NIH HHS / United States
R01 AG060993 / AG / NIA NIH HHS / United States
UL1 TR002001 / TR / NCATS NIH HHS / United States
R21 AG084218 / AG / NIA NIH HHS / United States
U01 TR002062 / TR / NCATS NIH HHS / United States
U01 TR002628 / TR / NCATS NIH HHS / United States
R01 GM141476 / GM / NIGMS NIH HHS / United States
U24 TR004111 / TR / NCATS NIH HHS / United States
R01 AG077017 / AG / NIA NIH HHS / United States
R01 HG012748 / HG / NHGRI NIH HHS / United States
UL1 TR001998 / TR / NCATS NIH HHS / United States
R01 AG068007 / AG / NIA NIH HHS / United States
UL1 TR002377 / TR / NCATS NIH HHS / United States
R01 LM014306 / LM / NLM NIH HHS / United States
UM1 TR004906 / TR / NCATS NIH HHS / United States
L30 TR002103 / TR / NCATS NIH HHS / United States
RF1 AG072799 / AG / NIA NIH HHS / United States
UL1 TR003163 / TR / NCATS NIH HHS / United States