Comparing Vision Transformers and CNNs for Accurate  Retinal Disease Classification

Paudyal, Binod

Comparing Vision Transformers and CNNs for Accurate Retinal Disease Classification

Files

11-25 Binod_Paudyal_Thesisv1.pdf (1.87 MB)

Date

2025

Authors

Paudyal, Binod

Abstract

Retinal diseases, such as Age-Related Macular Degeneration (AMD) and Diabetic Macular Edema (DME) significanty contribute to vision impairment in global scale. An early diagnosis and timely treatment can save a lot of people form blindness. This research focuses on leveraging Optical Coherence Tomography (OCT) images for the classification of retinal diseases using advanced deep learning models. Specifically, we explore the capabilities of Vision Transformers (ViTs), Convolutional Neural Networks (CNNs), and a proposed Hybrid CNN-Transformer model (HybridCNNViT). The HybridCNNViT model was developed by combining the local feature extraction strengths of CNNs with the global context modeling capabilities of Transformers. Comparative evaluations of accuracy, precision, and computational efficiency revealed that HybridCNNViT outperforms standalone ViTs and CNNs for retinal disease classification. As it offers a promising approach to improve healthcare outcomes in ophthalmology, can be further improved and used in applications of automated retinal disease detection and clinical diagnostics.