Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery

Maciej Laskowski; Marcin Ciekalski; Marcin Laskowski; Bartłomiej Błaszczyk; Marcin Setlak; Piotr Paździora; Adam Rudnik

doi:10.18794/aams/186827

2024 vol. 78

Figure from article: Performance of ChatGPT-3.5...

Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery

Maciej Laskowski ¹

Marcin Ciekalski ¹

Marcin Laskowski ²

Bartłomiej Błaszczyk ³

Marcin Setlak ³

Piotr Paździora ³

Adam Rudnik ³

More details

Hide details

Students’ Scientific Club, Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland

Unhyped, AI Growth Partner, Kraków, Poland

Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland

Corresponding author

Maciej Laskowski

Studenckie Koło Naukowe, Klinika Neurochirurgii, Wydział Nauk Medycznych w Katowicach, Śląski Uniwersytet Medyczny w Katowicach, ul. Medyków 14, 40-752 Katowice

Ann. Acad. Med. Siles. 2024;78:253-258

DOI: https://doi.org/10.18794/aams/186827

Article (PDF, 391.45 kB)

References (10) Citations (2)

KEYWORDS

ChatGPT

neurosurgery

artificial intelligence (AI)

TOPICS

ABSTRACT

Introduction:
In recent times, there has been an increased number of published materials related to artificial intelligence (AI) in both the medical field, and specifically, in the domain of neurosurgery. Studies integrating AI into neurosurgical practice suggest an ongoing shift towards a greater dependence on AI-assisted tools for diagnostics, image analysis, and decision-making.

Material and methods:
The study evaluated the performance of ChatGPT-3.5 and ChatGPT-4 on a neurosurgery exam from Autumn 2017, which was the latest exam with officially provided answers on the Medical Examinations Center in Łódź, Poland (Centrum Egzaminów Medycznych – CEM) website. The passing score for the National Specialization Exam (Państwowy Egzamin Specjalizacyjny – PES) in Poland, as administered by CEM, is 56% of the valid questions. This exam, chosen from CEM, comprised 116 single-choice questions after eliminating four outdated questions. These questions were categorized into ten thematic groups based on the subjects they address. For data collection, both ChatGPT versions were briefed on the exam rules and asked to rate their confidence in each answer on a scale from 1 (definitely not sure) to 5 (definitely sure). All the interactions were conducted in Polish and were recorded.

Results:
ChatGPT-4 significantly outperformed ChatGPT-3.5, showing a notable improvement with a 29.4% margin (p < 0.001). Unlike ChatGPT-3.5, ChatGPT-4 successfully reached the passing threshold for the PES. ChatGPT-3.5 and ChatGPT-4 had the same answers in 61 questions (52.58%), both were correct in 28 questions (24.14%), and were incorrect in 33 questions (28.45%).

Conclusions:
ChatGPT-4 shows improved accuracy over ChatGPT-3.5, likely due to advanced algorithms and a broader training dataset, highlighting its better grasp of complex neurosurgical concepts.

REFERENCES (10)

The Age of Artificial Intelligence: A brief history... Deloitte Malta, 01 Nov 2022 [online] https://www2.deloitte.com/mt/e... [accessed on 21 October 2023].

Google Scholar

Brockman G., Sutskever I., OpenAI. Introducing OpenAI. OpenAI, December 11, 2015 [online] https://openai.com/blog/introd... [accessed on 21 October 2023].

Google Scholar

Brown T., Mann B., Ryder N., Subbiah M., Kaplan J., Dhariwal P. et al. Language models are few-shot learners. OpenAI, May 28, 2020 [online] https://openai.com/research/la... [accessed on 21 October 2023].

Google Scholar

Bhasker S., Bruce D., Lamb J., Stein G. Tackling healthcare’s biggest burdens with generative AI. McKinsey & Company, July 10, 2023 [online] https://www.mckinsey.com/indus... [accessed on 21 October 2023].

Google Scholar

KMS Staff. Harnessing The Benefits of OpenAI in Healthcare. KMS Healthcare, June 29, 2023 [online] https://kms-healthcare.com/ben... [accessed on 21 October 2023].

Google Scholar

El-Hajj V.G., Gharios M., Edström E., Elmi-Terander A. Artificial intelligence in neurosurgery: A bibliometric analysis. World Neurosurg. 2023; 171: 152–158.e4, doi: 10.1016/j.wneu.2022.12.087.

CrossRef

Google Scholar

Danilov G.V., Shifrin M.A., Kotik K.V., Ishankulov T.A., Orlov Y.N., Kulikov A.S. et al. Artificial intelligence in neurosurgery: A systematic review using topic modeling. Part I: Major research areas. Sovrem. Tekhnologii Med. 2021; 12(5): 106–112, doi: 10.17691/stm2020.12.5.12.

CrossRef

Google Scholar

Ali R., Tang O.Y., Connolly I.D., Zadnik Sullivan P.L., Shin J.H., Fridley J.S. et al. Performance of ChatGPT and GPT-4 on neurosurgery written board examinations. Neurosurgery 2023; 93(6): 1353–1365, doi: 10.1227/neu.0000000000002632.

CrossRef

Google Scholar

Hopkins B.S., Nguyen V.N., Dallas J., Texakalidis P., Yang M., Renn A. et al. ChatGPT versus the neurosurgical written boards: a comparative analysis of artificial intelligence/machine learning performance on neurosurgical board-style questions. J. Neurosurg. 2023; 139(3): 904–911, doi: 10.3171/2023.2.JNS23419.

CrossRef

Google Scholar

10.

Seghier M.L. ChatGPT: not all languages are equal. Nature 2023; 615(7951): 216, doi: 10.1038/d41586-023-00680-3.

CrossRef

Google Scholar

CITATIONS (2):

Assessment of the Ability of the ChatGPT-5 Model to Pass the Endocrinology Specialization Exam
Ada Latkowska, Piotr Sawina, Tomasz Dolata, Dawid Boczkowski, Aleksandra Wielochowska, Anna Kowalczyk, Michalina Loson-Kawalec, Dominika Radej, Wojciech Jaworski, Weronika Majchrowicz, Melania Olender, Julia Adamiak, Julia Sroczynska, Rania Suleiman, Julia Glinska, Patrycja Szczerbanowicz, Patrycja Dadynska
Cureus

CrossRef

Artificial intelligence in anatomy education: a systematic review of ChatGPT’s effectiveness as a learning tool
Esin Erbek, Seval Çalışkan Pala, Güneş Bolatlı, Fatih Çavuş
Baylor University Medical Center Proceedings

CrossRef

Submit your paper

For Authors