Exploring ChatGPT Vision in Radiology

Congratulations to UCD School of Medicine alumni, Dr Brendan S Kelly and Dr Pearse Keane, and all those involved in their recently published research titled, ‘Can ChatGPT4-vision identify radiologic progression of multiple sclerosis on brain MRI?’

The research assesses how ChatGPT stacks up against state-of-the-art AI models like U-Net and ViTs for detecting progression on brain MRI in multiple sclerosis (MS).

MS progression is often subtle. Detecting changes over time is a complex task, crucial for personalised care. But how well can general-purpose AI, like GPT-4V, handle this challenge compared to dedicated models?

The team tested GPT-4V (in a zero-shot setting) against U-Net and ViT, trained specifically for change detection on paired MRIs. The goal was to assess whether GPT-4V’s accessibility could make AI research more democratised without compromising performance.

Key Results
GPT-4V Accuracy: 85% (95% CI: 77–91%)
U-Net & ViT Accuracy: 94% (95% CI: 89–98%)

While GPT-4V fell short in absolute accuracy, it demonstrated impressive zero-shot learning capabilities, performing better than expected for such a complex task.

GPT-4V also showed a cautious streak—providing non-answers in some cases. While this reduces outright errors, it highlights GPT-4Vs limitations in decision-making. Precision and recall metrics showed GPT-4V more aligned with ViT than U-Net, hinting at its underlying architecture.

GPT-4V doesn’t require coding, tuning, or technical expertise—making AI research more accessible. Caution Required: Misclassifications and non-answers highlight its limitations in clinical settings – promising, but not ready for medical use.

While GPT-4V shows potential, it lags behind dedicated models like U-Net and ViT in this specific task. This study is a starting point, showcasing the potential of general-purpose AI in imaging while underlining the importance of expert oversight.

Key takeaways:

  • Accessibility: GPT-4V doesn’t require coding or fine-tuning, democratising AI radiology research.
  • Clinical Readiness: GPT-4V isn’t ready for clinical use due to occasional misclassifications and cautious responses.
  • Comparative Value: It’s a solid research tool but currently outperformed by specialised models like U-Net and ViT.

Read the full paper in European Radiology Experimental