Evaluating the readability and quality of AI-generated scoliosis education materials: a comparative analysis of five language models
Mengchu Zhao, Mi Zhou, Yexi Han, Xiaomei Song, Youbin Zhou, Haoning He
Sci Rep. 2025 Oct 10;15(1):35454. doi: 10.1038/s41598-025-19370-3.
ABSTRACT
The complexity of scoliosis-related terminology and treatment options often hinders patients and caregivers from understanding their choices, making it difficult to make informed decisions. As a result, many patients seek guidance from artificial intelligence (AI) tools. However, AI-generated health content may suffer from low readability, inconsistency, and questionable quality, posing risks of misinformation.
This study evaluates the readability and informational quality of scoliosis-related content produced by AI. We evaluated five AI models-ChatGPT-4o, ChatGPT-o1, ChatGPT-o3 mini-high, DeepSeek-V3, and DeepSeek-R1-by querying each on three types of scoliosis: congenital, adolescent idiopathic, and neuromuscular.
Readability was assessed using the Flesch-Kincaid Grade Level (FKGL) and FleschKincaid Reading Ease (FKRE), while content quality was evaluated using the DISCERN score. Statistical analyses were performed in R-Studio. Inter-rater reliability was calculated using the Intraclass Correlation Coefficient (ICC). DeepSeek-R1 achieved the lowest FKGL (6.2) and the highest FKRE (64.5), indicating superior readability.
In contrast, ChatGPT-o1 and ChatGPT-o3 mini-high scored above FKGL 12.0, requiring college-level reading skills. Despite readability differences, DISCERN scores remained stable across models (~ 50.5/80) with high inter-rater agreement (ICC = 0.85-0.87), suggesting a fair level quality. However, all responses lacked citations, limiting reliability.
AI-generated scoliosis education materials vary significantly in readability, with DeepSeek-R1 being the most accessible. Future AI models should enhance readability without compromising information accuracy and integrate real-time citation mechanisms for improved trustworthiness.
Keywords: AI-generated health information; DISCERN score; Patient health literacy; Readability assessment; Scoliosis.
© 2025. The Author(s).



Leave a Reply
Want to join the discussion?Feel free to contribute!