Introduction
Age-related macular degeneration (AMD) is a leading cause of visual impairment in both developed and developing nations, particularly in those aged 65 years and older. Projections indicate that by the year 2040, the global population affected by AMD is expected to reach approximately 288 million people.1 Early AMD is characterised by drusens and abnormalities of the retinal pigment epithelium. As the disease progresses, advanced AMD manifests as neovascular (nAMD) or central geographical atrophy (also called dry or non-exudative AMD). Advanced AMD accounts for an estimated 90% of cases of severe vision loss.2 Neovascular AMD can be treated effectively using anti-vascular endothelial growth factor agents, allowing patients to lead productive lives if diagnosed and treated early.3 Therefore, it is crucial to establish a robust and efficient screening system, especially in areas with low specialist-to-patient ratios.
Fundus photography is the gold standard tool used in primary care settings and efficiently detects AMD.4–6 However, the interpretation of photographs necessitates trained personnel; due to the lack of sufficient experts, a delay is expected in underserved areas, including developing countries. Manual screening is also challenging in the developed world due to a large target population. As a result, creating automated tools to identify referable AMD is useful in both developed and developing countries. Various groups have made considerable efforts in this direction.7–11
The AMD detection algorithm was developed using Convolutional Neural Networks (CNN), a category of deep learning (DL) algorithms. In brief, a computer learns how to solve a vision task by iterating over a large data set of input images and their desired outputs. A large collection of mathematical operations, arranged in layers (the model) and the coefficients used in those operations (the weights) are automatically adjusted to generate the desired output for a given input image. When trained adequately, the resulting model can generalise images beyond the input data set and generate accurate outputs for any image of the same problem space.
Smartphone-based, affordable fundus cameras, powered with artificial intelligence (AI) capabilities, have shown promise and can significantly expand access to care for a broader population. Our group has previously developed and validated a DL-based AI algorithm for screening people with diabetic retinopathy using a smartphone-based fundus imaging system.12–14 The current study describes the results of our AI algorithm for screening referable AMD.
Methodology
Ethical approval
The study was conducted per the Declaration of Helsinki. Approval was obtained from NIH to use the AREDS database (#89924–2 and #89923–2).
The test data set contains de-identified retrospective images from the target device (Remidio FOP NM10 Remidio Innovative Solutions, Bangalore, India) server where providers have taken consent from patients regarding using de-identified data for research and development.
Overview
CNNs excel when trained with vast data sets. Ideally, to achieve top accuracy, data sets should encompass tens of thousands of images. Importantly, these images should mirror the real-world conditions where the CNN will operate. For instance, a CNN trained using images from a specific camera or of a particular ethnicity may not perform as well outside those conditions. This represents a challenge when the goal is to train a CNN for a portable non-mydriatic camera for global use. Generating a training set in the magnitude of tens of thousands of images with a novel device is resource-intensive and time-consuming, especially when focusing on populations with low prevalence for the target disease.
Transfer learning is a common DL technique that can offer a solution. Instead of starting the training process with random weights, weights trained on a related task are used as a starting point to leverage on pre-existing knowledge in the model. This generally leads to quicker and more accurate training outcomes, even with smaller data sets.
Our approach used transfer learning to harness another extensive data set: Age-Related Eye Disease Study (AREDS). This data set was captured with a tabletop camera in the USA. We retrieved 128,918 gradable macula-centred images from the study. We used a two-step process: we initially trained a first model detecting the presence of AMD using solely the AREDS data set. We then fine-tuned this model on a small data set of 1853 images captured by the target device and on another ethnicity. The data set collected on the target device is thus not used to train an AMD detection neural network from scratch. It is used to adapt an existing AMD detection model to the specificities of the target camera and population. Our hypothesis is that this strategy would create an efficient CNN without the need for an extensive data set from the target device and population.
The model was trained to separate non-referable AMD images from referable AMD images as a binary classification. AMD grading was as per the AREDS system: stage 1 (no AMD), stage 2 (early AMD), stage 3 (intermediate AMD) and stage 4 (advanced AMD with foveal geographical atrophy (GA) or choroidal neovascularisation (nAMD)). We defined non-referable AMD as no AMD or early AMD and referable AMD as intermediate or advanced AMD (GA or nAMD). The operating point was chosen to achieve the highest sensitivity for 85% specificity. Extensive experiments were conducted to determine the right model to achieve this. We chose the EfficientNet V.2 architecture, designed by Google, for its suitability for portable, on-the-edge deployments.
Preprocessing
As a first step, the images inputted into the network underwent a cropping process to eliminate black borders caused by the fundus camera. These were then downsampled to a standardised image size of 300×300 pixels. In the first training stage, random horizontal flipping was applied. During the second stage, we introduced various image enhancements such as rotations, contrast and brightness, hue, saturation changes and grid dropout.
First training phase
The first stage of the training used the AREDS data set. This step aimed to train a robust model independent of the target device. This initial model has been trained by fine-tuning a model pretrained on ImageNet, a generic image classification task. During this phase, 128,918 macula-centred fundus images belonging to 4028 subjects and their diagnosis were used. The data set was split into the train (126,823 images), validation (1 186 images), and test (909 images) set. Borderline images, defined as early AMD with a medium drusen area (63–124 microns in diameter), were removed from the training and validation sets but not from the test set. Empirical experimentation showed that this strategy improved neural network convergence. The final train and validation set comprised 108 251 images (55% referable AMD) and 990 images (45% referable AMD). The test set of 909 images contained 49% of referable AMD with all stages of the disease (table 1). In both steps, care was taken to avoid using multiple images of the same patient in different data sets. Image allocation in all sets respected the distribution of AMD stages and other phenotypical data.
Second training phase
The second stage of training consisted of fine-tuning the model with a much smaller data set captured with the target deployment device. The objective of this step was to adapt the model trained during the first stage to the characteristics (such as image resolution, tint, field of view and pigmentation) of the images captured with the target device. The data set consists of 2012 macula-centred fundus images (ethnicity-South Asian) gathered in a screening and clinical setting using the target device between March 2013 and October 2020 and between January and March 2022 with a mean age of 51.9 years, with a comparable distribution of men (55.7%) and women (44.3%). Adults who participated in outreach screening camps and in-clinical tests with any stage of AMD or deemed to have normal retina were included. Images deemed ungradable/inconclusive on referability by experts and other retinal conditions were excluded. The data set has been labelled by three retina specialists for image quality (online supplemental file 1) and stage of disease based on the AREDS four category classification. 101 images were removed after graders disagreed on referability, and 327 were removed after they deemed them ungradable. The kappa agreements between each specialist’s grades and their consensus for referable AMD were 0.715, 0.824 and 0.722. The remaining 1584 images were split into the train (1108 images, 33% referable AMD), validation (238 images, 33% referable AMD) and test sets (238 images, 34% referable AMD) (table 1).
Supplemental material
Statistical analysis
We analysed accuracy, sensitivity, specificity and positive and negative predictive values. We have also used receiver operating characteristic curves to check the detection probability for the algorithm, as every classifier has a trade off between sensitivity and specificity. The 95% CIs were also calculated. The NumPy and scikit-learn Python libraries were used for statistical analysis.
Patient and public involvement
None.
Discussion
In this study, the DL algorithm demonstrated promising sensitivity and specificity in identifying referable AMD. The performance of the DL model was comparable to the reference standard of grading by ophthalmologists. Furthermore, DL performed equally well with the target device and the AREDS database.
Previous studies have explored automated software for referable AMD detection. A meta-analysis by Dong et al included 13 AI-based studies.15 They found an overall sensitivity of 88% and a specificity of 90% with an AUC of 0.983. The majority of the studies in this meta-analysis also used an AREDS database. For studies applying convolutional neural networks on the AREDS database, the pooled AUC, sensitivity and specificity were 0.983 (95% CI: 0.978 to 0.988), 0.88 (95% CI: 0.88 to 0.88) and 0.91 (95% CI: 0.91 to 0.91), respectively. Our specificity after the stage using AREDS is lower (82.3%) but acceptable. This can be attributed to our choice of neural network architecture. Various CNN architectures are available, such as CifarNet, AlexNet, Inception V.1 (GoogleNet), all of which achieve high accuracy but at the expense of heavy architecture. We have used the EfficientNetV.2 (Google) architecture, a lightweight design intended for efficient deployment on a smartphone-based camera, making screening more accessible and effective. This AI algorithm can be deployed as an offline application integrated into the smartphone-based, non-mydriatic retinal imaging system. It can be deployed as a component of the camera control application and thus seamlessly integrated into the image acquisition workflow. An AMD assessment algorithm generates a diagnosis by detecting drusen and/or other characteristics of intermediate and advanced AMD involving the macula.
In this meta-analysis, eight studies reported outcomes for referrable AMD, with an AUC above 0.90 in all but one study by Phan et al, which had an AUC of 0.87, possibly due to a smaller private database containing 279 images.15 16 None of the studies employed smartphone-based fundus cameras as target devices. Our study contributes to the existing knowledge of DL-based AMD detection. The accuracy of AI-based algorithms varies due to differences in architecture, data set size, image quality and validation methodology. Recent works on simultaneous automated cloud-based screening of AMD and diabetic retinopathy have demonstrated promising performance.17–19 Additionally, fundus-image-based algorithms detecting multiple retinal conditions have shown effective results, including the screening of AMD.20 21 Most recently, several algorithms have been developed using different approaches to differentiate the severity stages of AMD which is crucial for early detection, precise diagnosis and clinical treatment strategies.22–25 Mathieu et al trained a model (DeepAlienorNet) to detect the presence of seven different clinical signs, such as types of drusens, hypopigmentation or hyperpigmentation or advanced AMD with a sensitivity and specificity of 0.77 (0.72–0.82) and 0.83 (0.81–0.85), respectively.22 Sarao et al designed an explainable DL model for the detection of GA achieving 100% sensitivity.23 Similarly, Abd El-Khalek et al trained a model to classify retinal images into normal, GA, intermediate AMD and wet AMD and developed a comprehensive computer-aided diagnosis framework for categorisation.24 Morano et al designed a model with a custom architecture that both predicts the presence of AMD and identifies the lesions.25 While such models are essential for retina specialists in clinical decision-making and support, fundus image-based AMD AI screening solutions, such as the one described here are critical for population-level screening.
A key strength of this study is that it does not exclusively rely on the AREDS database but also incorporates a database from the target device featuring real-world digitised images. This approach supports a practical deployment of the algorithm in the field. Undoubtedly, the AREDS is the largest publicly available database with more than 130,000 fundus images; but it is essential to recognise that certain nuances of hard drusen and age-related changes for clinical classification of AMD did not exist in the 1980s during AREDS. This factor might render the AREDS database alone insufficient for developing a robust AI. Additionally, the AREDS images were originally film-based and later digitised, which could potentially impact the performance of the DL algorithm. The target device test data set run on the model trained on AREDS only demonstrated sensitivity and specificity of 88.75% (95% CI: 79.72% to 94.72%) and 60.76% (95% CI: 52.69% to 68.42%), respectively. When the neural network was trained directly on the target device data set without prior training on AREDS, the performance (sensitivity and specificity) was 65% (95% CI: 53.52% to 75.33%) and 68.35% (95% CI: 60.49% to 75.51%), respectively. The approach of training with an AREDS data set and fine-tuning with the target device data set yielded better performance compared against the reference standard (table 2). As a future scope, the performance of the test data set needs further evaluation in real-world studies.
However, this study has a limitation, as it only assesses the diagnostic performance of AI for referable AMD. There is a risk of missing other retinal pathologies if this model is used alone. Another limitation was that metadata was present in up to 75% of the participants’ images on the target device in remote screening contexts, while it was unknown, undisclosed or unavailable in the remaining images.
Our work carries significant clinical and public health-related benefits. It reduces the burden on specialists, enables remote care in hard-to-reach areas and facilitates early intervention that may result in long-term vision preservation. Despite the promises of AI-based predictive models, it is not without challenges. The most important challenge is its dependence on image quality. Consequently, a system for rejecting ungradable, poor-quality images currently under development must be in place. Additionally, AI is often considered a ‘black box’, potentially raising concerns about clinicians’ trust in the system. We have attempted to address the model’s interpretability to some extent through class-activation maps. False negatives remain an issue, particularly for AMD, which requires early treatment to prevent scarring, unlike diabetic retinopathy, usually a slowly progressive disease. However, this same factor underscores the need for early AMD screening using robust tools such as AI.
This current work presents a promising approach toward using DL for automated AMD analysis in a smartphone-based imaging system for the first time. In our continuing research, we plan to perform a prospective evaluation of AI performance across various AMD severity stages in real-world settings. Additionally, we evaluate detecting combined pathologies, which may offer additional advantages and improved clinical applications.
Conclusion
Our study indicates that the DL-based AI algorithm shows promise in detecting referable AMD with high sensitivity and specificity, even when using a small target device and population data set. This success is achieved by fine-tuning a model previously trained on a larger data set for the same task. The approach proved quite effective despite AREDS images being captured by traditional desktop cameras on a different population. Integrating such an algorithm into an application associated with a smartphone-based fundus imaging system could significantly aid in screening populations in underserved areas and would be less expensive.
This post was originally published on https://bmjopen.bmj.com