Using Large Language Models to Annotate Complex Cases of Social Determinants of Health in Longitudinal Clinical Records
Social Determinants of Health (SDoH) such as housing insecurity are known to be intricately linked to patients’ health status. Large language models (LLMs) have shown potential for performing complex annotation tasks on unstructured clinical notes. Our work assessed the performance of LLMs (GPT-4 and GPT-3.5) in identifying temporal aspects of housing insecurity, and compared performance between original and de-identified notes.
Our 2024 medRxiv pre-print can be found here
- Compared with GPT-3.5 and a named entity recognition (NER) model, GPT-4 had the highest performance and had a much higher recall than human annotators in identifying patients experiencing current or past housing instability.
- In most cases, the evidence output by GPT-4 was similar or identical to that of human annotators, and there was no evidence of hallucinations in any of the outputs from GPT-4.
- GPT-4 precision improved slightly on de-identified versions of the same notes, while recall dropped.
Recall and precision metrics for GPT-4 and GPT-3.5 for each housing label.
Comparison of GPT-4 performance in identifying general housing status or current and past housing instability from three different versions of the same set of 539 notes. Notes were either complete (original) patient notes, completely de-identified notes, or de-identified notes with no date shift.
We compared the ability of GPT-3.5 and GPT-4 to identify instances of both current and past housing instability, as well as general housing status, from 25,217 notes from 795 pregnant women. Results were compared with manual annotation, a named entity recognition (NER) model, and regular expressions (RegEx). The detailed annotation guidelines are here.
GPT-4 version 0613 had a 32K token window, while GPT-3.5 Turbo version 0613 had a 16K token window. Both GPT models were run using LangChain and OpenAI libraries. You can find the script for running final prompt in GPT_prompt.py.
The John Snow Labs (JSL) NER model is an SDoH model designed to detect and label SDoH entities within text data. The housing-specific label includes entities related to the conditions of the patient’s living spaces, for example: homeless, housing, small apartment, etc.
The data used in this study contains identifiable protected health information and therefore cannot be shared publicly. Investigators from Providence Health and Services and affiliates (PHSA) with an appropriate IRB approval can contact the authors directly regarding data access.
For further information or queries, please contact:
- Alexandra Ralevski (alexandraDOTralevskiATgmailDOTcom)
- Michael Nossal (mikenossalATgmailDOTcom)
- Nadaa Taiyab (nadaaDOTtaiyabATgmailDOTcom)