Feb. 22, 2024 — According to research from the New York Eye and Ear Infirmary of Mount Sinai (NYEE), a large language model (LLM) artificial intelligence (AI) system can match and, in some cases, outperform human ophthalmologists in the diagnosis and treatment of patients with glaucoma and retina disease.
Published in JAMA Ophthalmology, the recent study suggests that advanced AI tools could provide decision-making support to ophthalmologists in the diagnosis and management of cases involving glaucoma and retina disorders.
The knowledge of ophthalmic specialists was matched against the capabilities of the latest generation AI system, Generative Pre-Training–Model 4 (GPT-4) from OpenAI, engineered to replicate human-level performance and trained on vast amounts of data, text, and images.
Mount Sinai shares that AI has the potential to revolutionize diagnosis and treatment tools due to the accuracy and comprehensiveness of their LLM-generated responses—especially in the field of ophthalmology, where AI can assist with the high volume of often complex patients.
“The performance of GPT-4 in our study was quite eye-opening,” says Andy Huang, M.D., ophthalmology resident at NYEE and lead author of the study. “We recognized the enormous potential of this AI system from the moment we started testing it and were fascinated to observe that GPT-4 could not only assist but, in some cases, match or exceed the expertise of seasoned ophthalmic specialists.”
The study featured 12 attending specialists and three senior trainees from the department of ophthalmology at the Icahn School of Medicine at Mount Sinai. Participants were given 20 questions (10 each for glaucoma and retina) from the American Academy of Ophthalmology’s list of commonly asked questions by patients along with 20 deidentified patient cases from Mount Sinai-affiliated eye clinics. Their responses were statistically analyzed and rated for accuracy and thoroughness along with the GPT-4/AI system’s responses using a Likert scale.
Findings show that AI matched or outperformed human specialists in both accuracy and completeness of its medical advice and assessments, particularly with glaucoma questions and case-management advice. In retina-related questions, AI matched humans in accuracy but exceeded them in completeness.
“Just as the AI application Grammarly can teach us how to be better writers, GPT-4 can give us valuable guidance on how to be better clinicians, especially in terms of how we document findings of patient exams,” says Louis R. Pasquale, M.D., FARVO, deputy chair for ophthalmology research for the department of ophthalmology and senior author of the study.
While emphasizing the need for more testing, Dr. Huang shares, “It could serve as a reliable assistant to eye specialists by providing diagnostic support and potentially easing their workload. For patients, the integration of AI into mainstream ophthalmic practice could result in quicker access to expert advice, coupled with more informed decision-making to guide their treatment.”