We’re excited to share that our latest research, “Bridging the Gap: Can Large Language Models Match Human Expertise in Writing Neurosurgical Operative Notes?” has just been published in the respected journal World Neurosurgery. This study represents a significant step toward integrating artificial intelligence into clinical documentation practices within neurosurgery.
Operative notes are an essential component of patient care, serving as vital medical records for postoperative management, medicolegal protection, and clinical audits. Despite their importance, writing these detailed notes can be particularly time-consuming for neurosurgeons, who already face high workloads and intense schedules. Our team saw an opportunity in the growing capabilities of large language models (LLMs) like OpenAI’s ChatGPT 4.0 to potentially streamline this process, standardize note-taking, and reduce the workload on clinical staff.
To thoroughly assess ChatGPT’s potential, we took a rigorous scientific approach. First, we selected a diverse collection of operative notes from cranial trauma and spinal surgeries performed by two experienced attending neurosurgeons. Using individualized templates based on each surgeon’s documentation style, we trained ChatGPT to create operative notes from key clinical data, including age, gender, surgical procedure details, diagnosis, complications, and estimated blood loss. These AI-generated notes were then compared against the original surgeon-created documentation.
A team of three blinded external neurosurgeons independently assessed these notes, evaluating their accuracy, comprehensiveness of content, and overall organization. In addition, we examined readability metrics such as the Flesch-Kincaid Grade Level and the Flesch Reading Ease scores to measure the clarity and accessibility of each note.
Our findings were compelling and insightful. ChatGPT-generated operative notes matched the accuracy and organizational quality of notes written by neurosurgeons, suggesting that artificial intelligence can accurately represent surgical events in a logical, clearly organized manner. In terms of speed, the AI significantly outperformed human documentation, completing each note in roughly 50 seconds—far quicker than manual note-taking, which typically consumes several minutes per patient.
However, there were areas where the AI still lagged behind. The content depth and comprehensiveness of AI-generated notes were inferior compared to human-authored notes, reflecting the nuances and detailed insights surgeons naturally embed in their documentation. Moreover, AI-generated notes consistently demonstrated higher complexity and more advanced language than necessary, potentially affecting readability and utility in clinical practice.
Despite these limitations, the study clearly highlights AI’s potential to transform neurosurgical documentation, reduce clinicians’ administrative burdens, and promote standardized, consistent record-keeping practices. With continuous advancements and refinements in artificial intelligence models, we foresee significant improvements in AI-generated operative notes, potentially achieving parity with, or even surpassing, human expertise.
We look forward to further exploration and collaboration to optimize AI for neurosurgical practices, benefiting both medical professionals and their patients. Our team extends gratitude to all collaborators and reviewers who made this research possible. To explore our full findings, we invite you to read our paper in the December 2024 issue of World Neurosurgery.