Deep Learning for Image Quality Evaluation of Optical Coherence Tomography Angiography

        Thank you for visiting Nature.com. You are using a browser version with limited CSS support. For the best experience, we recommend that you use an updated browser (or disable Compatibility Mode in Internet Explorer). In addition, to ensure continued support, we show the site without styles and JavaScript.
        Sliders showing three articles per slide. Use the back and next buttons to move through the slides, or the slide controller buttons at the end to move through each slide.
        Optical coherence tomographic angiography (OCTA) is a new method for non-invasive visualization of retinal vessels. Although OCTA has many promising clinical applications, determining image quality remains a challenge. We developed a deep learning based system using the ResNet152 neural network classifier pretrained with ImageNet to classify superficial capillary plexus images from 347 scans of 134 patients. The images were also manually assessed as true truth by two independent raters for a supervised learning model. Because image quality requirements may vary depending on clinical or research settings, two models were trained, one for high quality image recognition and the other for low quality image recognition. Our neural network model shows an excellent area under the curve (AUC), 95% CI 0.96-0.99, \(\kappa\) = 0.81), which is significantly better than the signal level reported by the machine (AUC = 0.82, 95% CI). 0.77–0.86, \(\kappa\) = 0.52 and AUC = 0.78, 95% CI 0.73–0.83, \(\kappa\) = 0.27, respectively). Our study demonstrates that machine learning methods can be used to develop flexible and robust quality control methods for OCTA images.
        Optical coherence tomographic angiography (OCTA) is a relatively new technique based on optical coherence tomography (OCT) that can be used for non-invasive visualization of the retinal microvasculature. OCTA measures the difference in reflection patterns from repeated light pulses in the same area of ​​the retina, and reconstructions can then be calculated to reveal blood vessels without the invasive use of dyes or other contrast agents. OCTA also enables depth-resolution vascular imaging, allowing clinicians to separately examine superficial and deep vessel layers, helping to differentiate between chorioretinal disease.
        While this technique is promising, image quality variation remains a major challenge for reliable image analysis, making image interpretation difficult and preventing widespread clinical adoption. Because OCTA uses multiple consecutive OCT scans, it is more sensitive to image artifacts than standard OCT. Most commercial OCTA platforms provide their own image quality metric called Signal Strength (SS) or sometimes Signal Strength Index (SSI). However, images with a high SS or SSI value do not guarantee the absence of image artifacts, which can affect any subsequent image analysis and lead to incorrect clinical decisions. Common image artifacts that can occur in OCTA imaging include motion artifacts, segmentation artifacts, media opacity artifacts, and projection artifacts1,2,3.
        As OCTA-derived measures such as vascular density are increasingly being used in translational research, clinical trials and clinical practice, there is an urgent need to develop robust and reliable image quality control processes to eliminate image artefacts4. Skip connections, also known as residual connections, are projections in neural network architecture that allow information to bypass convolutional layers while storing information at different scales or resolutions5. Because image artifacts can affect small-scale and general large-scale image performance, skip-connection neural networks are well suited to automate this quality control task5. Recently published work has shown some promise for deep convolutional neural networks trained using high quality data from human estimators6.
        In this study, we train a connection-skipping convolutional neural network to automatically determine the quality of OCTA images. We build on previous work by developing separate models for identifying high quality images and low quality images, as image quality requirements may differ for specific clinical or research scenarios. We compare the results of these networks with convolutional neural networks without missing connections to evaluate the value of including features at multiple levels of granularity within deep learning. We then compared our results to signal strength, a commonly accepted measure of image quality provided by manufacturers.
        Our study included patients with diabetes who attended the Yale Eye Center between August 11, 2017 and April 11, 2019. Patients with any non-diabetic chorioretinal disease were excluded. There were no inclusion or exclusion criteria based on age, gender, race, image quality, or any other factor.
        OCTA images were acquired using the AngioPlex platform on a Cirrus HD-OCT 5000 (Carl Zeiss Meditec Inc, Dublin, CA) under 8\(\times\)8 mm and 6\(\times\)6 mm imaging protocols. Informed consent for participation in the study was obtained from each study participant, and the Yale University Institutional Review Board (IRB) approved the use of informed consent with global photography for all these patients. Following the principles of the Declaration of Helsinki. The study was approved by the Yale University IRB.
        Surface plate images were evaluated based on the previously described Motion Artifact Score (MAS), the previously described Segmentation Artifact Score (SAS), the foveal center, the presence of media opacity, and good visualization of small capillaries as determined by the image evaluator. The images were analyzed by two independent evaluators (RD and JW). An image has a graded score of 2 (eligible) if all of the following criteria are met: image is centered at the fovea (less than 100 pixels from the center of the image), MAS is 1 or 2, SAS is 1, and media opacity is less than 1. Present on images of size / 16, and small capillaries are seen in images larger than 15/16. An image is rated 0 (no rating) if any of the following criteria are met: the image is off-center, if MAS is 4, if SAS is 2, or the average opacity is greater than 1/4 of the image, and the small capillaries cannot be adjusted more than 1 image /4 to distinguish. All other images that do not meet the scoring criteria 0 or 2 are scored as 1 (clipping).
        On fig. 1 shows sample images for each of the scaled estimates and image artifacts. Inter-rater reliability of individual scores was assessed by Cohen’s kappa weighting8. The individual scores of each rater are summed to obtain an overall score for each image, ranging from 0 to 4. Images with a total score of 4 are considered good. Images with a total score of 0 or 1 are considered low quality.
        A ResNet152 architecture convolutional neural network (Fig. 3A.i) pre-trained on images from the ImageNet database was generated using fast.ai and the PyTorch framework5, 9, 10, 11. A convolutional neural network is a network that uses the learned filters for scanning image fragments to study spatial and local features. Our trained ResNet is a 152-layer neural network characterized by gaps or “residual connections” that simultaneously transmit information with multiple resolutions. By projecting information at different resolutions over the network, the platform can learn the features of low-quality images at multiple levels of detail. In addition to our ResNet model, we also trained AlexNet, a well-studied neural network architecture, without missing connections for comparison (Figure 3A.ii)12. Without missing connections, this network will not be able to capture features at a higher granularity.
        The original 8\(\times\)8mm OCTA13 image set has been enhanced using horizontal and vertical reflection techniques. The full dataset was then randomly split at the image level into training (51.2%), testing (12.8%), hyperparameter tuning (16%), and validation (20%) datasets using the scikit-learn toolbox python14. Two cases were considered, one based on detecting only the highest quality images (overall score 4) and the other based on detecting only the lowest quality images (overall score 0 or 1). For each high-quality and low-quality use case, the neural network is retrained once on our image data. In each use case, the neural network was trained for 10 epochs, all but the highest layer weights were frozen, and the weights of all internal parameters were learned for 40 epochs using a discriminative learning rate method with a cross-entropy loss function 15,16. . The cross entropy loss function is a measure of the logarithmic scale of the discrepancy between predicted network labels and real data. During training, gradient descent is performed on the internal parameters of the neural network to minimize losses. The learning rate, dropout rate, and weight reduction hyperparameters were tuned using Bayesian optimization with 2 random starting points and 10 iterations, and the AUC on the dataset was tuned using the hyperparameters as a target of 17.
        Representative examples of 8 × 8 mm OCTA images of superficial capillary plexuses scored 2 (A, B), 1 (C, D), and 0 (E, F). Image artifacts shown include flickering lines (arrows), segmentation artifacts (asterisks), and media opacity (arrows). Image (E) is also off-center.
        Receiver operating characteristics (ROC) curves are then generated for all neural network models, and engine signal strength reports are generated for each low-quality and high-quality use case. Area under the curve (AUC) was calculated using the pROC R package, and 95% confidence intervals and p-values ​​were calculated using the DeLong method18,19. The cumulative scores of the human raters are used as the baseline for all ROC calculations. For the signal strength reported by the machine, the AUC was calculated twice: once for the high quality Scalability Score cutoff and once for the low quality Scalability Score cutoff. The neural network is compared to the AUC signal strength reflecting its own training and evaluation conditions.
        To further test the trained deep learning model on a separate dataset, high quality and low quality models were directly applied to performance evaluation of 32 full face 6\(\times\) 6mm surface slab images collected from Yale University. Eye Mass is centered at the same time as the image 8 \(\times \) 8 mm. The 6\(\×\) 6 mm images were manually assessed by the same raters (RD and JW) in the same manner as the 8\(\×\) 8 mm images, AUC was calculated as well as accuracy and Cohen’s kappa. equally .
        The class imbalance ratio is 158:189 (\(\rho = 1.19\)) for the low quality model and 80:267 (\(\rho = 3.3\)) for the high quality model. Because the class imbalance ratio is less than 1:4, no specific architectural changes have been made to correct class imbalance20,21.
        To better visualize the learning process, class activation maps were generated for all four trained deep learning models: high quality ResNet152 model, low quality ResNet152 model, high quality AlexNet model, and low quality AlexNet model. Class activation maps are generated from the input convolutional layers of these four models, and heat maps are generated by overlaying activation maps with source images from the 8 × 8 mm and 6 × 6 mm validation sets22, 23.
       R version 4.0.3 was used for all statistical calculations, and visualizations were created using the ggplot2 graphics tool library.
        We collected 347 frontal images of the superficial capillary plexus measuring 8 \(\times \)8 mm from 134 people. The machine reported signal strength on a scale of 0 to 10 for all images (mean = 6.99 ± 2.29). Of the 347 images acquired, the mean age at examination was 58.7 ± 14.6 years, and 39.2% were from male patients. Of all images, 30.8% were from Caucasians, 32.6% from Blacks, 30.8% from Hispanics, 4% from Asians, and 1.7% from other races (Table 1). ). The age distribution of patients with OCTA differed significantly depending on the quality of the image (p < 0.001). The percentage of high-quality images in younger patients aged 18-45 years was 33.8% compared to 12.2% of low-quality images (Table 1). The distribution of diabetic retinopathy status also varied significantly in image quality (p < 0.017). Among all high quality images, the percentage of patients with PDR was 18.8% compared to 38.8% of all low quality images (Table 1).
        Individual ratings of all images showed moderate to strong inter-rating reliability between people reading the images (Cohen’s weighted kappa = 0.79, 95% CI: 0.76-0.82), and there were no image points where raters differed by more than 1 (Fig. 2A). . Signal intensity correlated significantly with manual scoring (Pearson product moment correlation = 0.58, 95% CI 0.51–0.65, p<0.001), but many images were identified as having high signal intensity but low manual scoring (Fig. .2B).
        During the training of the ResNet152 and AlexNet architectures, the cross-entropy loss on validation and training falls over 50 epochs (Figure 3B,C). Validation accuracy in the final training epoch is over 90% for both high quality and low quality use cases.
        Receiver performance curves show that the ResNet152 model significantly outperforms the signal power reported by the machine in both low and high quality use cases (p < 0.001). The ResNet152 model also significantly outperforms the AlexNet architecture (p = 0.005 and p = 0.014 for low quality and high quality cases, respectively). The resulting models for each of these tasks were able to achieve AUC values ​​of 0.99 and 0.97, respectively, which is significantly better than the corresponding AUC values ​​of 0.82 and 0.78 for the machine signal strength index or 0.97 and 0.94 for AlexNet. . (Fig. 3). The difference between ResNet and AUC in signal strength is higher when recognizing high quality images, indicating additional benefits of using ResNet for this task.
        The graphs show each independent rater’s ability to score and compare with the signal strength reported by the machine. (A) The sum of the points to be assessed is used to create the total number of points to be assessed. Images with an overall scalability score of 4 are assigned high quality, while images with an overall scalability score of 1 or less are assigned low quality. (B) Signal intensity correlates with manual estimates, but images with high signal intensity may be of poorer quality. The red dotted line indicates the manufacturer’s recommended quality threshold based on signal strength (signal strength \(\ge\)6).
        ResNet transfer learning provides a significant improvement in image quality identification for both low quality and high quality use cases compared to machine-reported signal levels. (A) Simplified architecture diagrams of pre-trained (i) ResNet152 and (ii) AlexNet architectures. (B) Training history and receiver performance curves for ResNet152 compared to machine reported signal strength and AlexNet low quality criteria. (C) ResNet152 receiver training history and performance curves compared to machine reported signal strength and AlexNet high quality criteria.
        After adjusting the decision boundary threshold, the maximum prediction accuracy of the ResNet152 model is 95.3% for the low quality case and 93.5% for the high quality case (Table 2). The maximum prediction accuracy of the AlexNet model is 91.0% for the low quality case and 90.1% for the high quality case (Table 2). The maximum signal strength prediction accuracy is 76.1% for the low quality use case and 77.8% for the high quality use case. According to Cohen’s kappa (\(\kappa\)), the agreement between the ResNet152 model and the estimators is 0.90 for the low quality case and 0.81 for the high quality case. Cohen’s AlexNet kappa is 0.82 and 0.71 for low quality and high quality use cases, respectively. Cohen’s signal strength kappa is 0.52 and 0.27 for the low and high quality use cases, respectively.
        Validation of high and low quality recognition models on 6\(\x\) images of a 6 mm flat plate demonstrates the ability of the trained model to determine image quality across various imaging parameters. When using 6\(\x\) 6 mm shallow slabs for imaging quality, the low quality model had an AUC of 0.83 (95% CI: 0.69–0.98) and the high quality model had an AUC of 0.85. (95% CI: 0.55–1.00) (Table 2).
        Visual inspection of the input layer class activation maps showed that all trained neural networks used image features during image classification (Fig. 4A, B). For 8 \(\times \) 8 mm and 6 \(\times \) 6 mm images, the ResNet activation images closely follow the retinal vasculature. AlexNet activation maps also follow retinal vessels, but with coarser resolution.
        The class activation maps for the ResNet152 and AlexNet models highlight features related to image quality. (A) Class activation map showing coherent activation after superficial retinal vasculature on 8 \(\times \) 8 mm validation images and (B) extent on smaller 6 \(\times \) 6 mm validation images. LQ model trained on low quality criteria, HQ model trained on high quality criteria.
        It has previously been shown that image quality can greatly affect any quantification of OCTA images. In addition, the presence of retinopathy increases the incidence of image artifacts7,26. In fact, in our data, consistent with previous studies, we found a significant association between increasing age and severity of retinal disease and deterioration in image quality (p < 0.001, p = 0.017 for age and DR status, respectively; Table 1) 27. Therefore, it is critical to assess image quality before performing any quantitative analysis of OCTA images. Most studies analyzing OCTA images use machine-reported signal intensity thresholds to rule out low quality images. Although signal intensity has been shown to affect the quantification of OCTA parameters, high signal intensity alone may not be sufficient to rule out images with image artifacts2,3,28,29. Therefore, it is necessary to develop a more reliable method of image quality control. To this end, we evaluate the performance of supervised deep learning methods against the signal strength reported by the machine.
        We have developed several models for evaluating image quality because different OCTA use cases may have different image quality requirements. For example, images should be of higher quality. In addition, specific quantitative parameters of interest are also important. For example, the area of ​​the foveal avascular zone does not depend on the turbidity of the non-central medium, but affects the density of the vessels. While our research continues to focus on a general approach to image quality, not tied to the requirements of any particular test, but intended to directly replace the signal strength reported by the machine, we hope to give users a greater degree of control so that they can select the specific metric of interest to the user. choose a model that corresponds to the maximum degree of image artifacts considered acceptable.
        For low-quality and high-quality scenes, we show excellent performance of connection-missing deep convolutional neural networks, with AUCs of 0.97 and 0.99 and low-quality models, respectively. We also demonstrate the superior performance of our deep learning approach when compared to signal levels reported only by machines. Skip connections allow neural networks to learn features at multiple levels of detail, capturing finer aspects of images (eg contrast) as well as general features (eg image centering30,31). Since image artifacts that affect image quality are probably best identified over a wide range, neural network architectures with missing connections may exhibit better performance than those without image quality determination tasks.
        When testing our model on 6\(\×6mm) OCTA images, we noticed a decrease in classification performance for both high quality and low quality models (Fig. 2), in contrast to the size of the model trained for classification. Compared to the ResNet model, the AlexNet model has a larger falloff. The relatively better performance of ResNet may be due to the ability of the residual connections to transmit information at multiple scales, which makes the model more robust for classifying images captured at different scales and/or magnifications.
        Some differences between 8 \(\×\) 8 mm images and 6 \(\×\) 6 mm images can lead to poor classification, including a relatively high proportion of images containing foveal avascular areas, changes in visibility, vascular arcades, and no optic nerve on the image 6×6 mm. Despite this, our high quality ResNet model was able to achieve an AUC of 85% for 6 \(\x\) 6 mm images, a configuration for which the model was not trained, suggesting that the image quality information encoded in the neural network is suitable. for one image size or machine configuration outside of its training (Table 2). Reassuringly, ResNet- and AlexNet-like activation maps of 8 \(\times \) 8 mm and 6 \(\times \) 6 mm images were able to highlight retinal vessels in both cases, suggesting that the model has important information. are applicable for classifying both types of OCTA images (Fig. 4).
        Lauerman et al. Image quality assessment on OCTA images was similarly performed using the Inception architecture, another skip-connection convolutional neural network6,32 using deep learning techniques. They also limited the study to images of the superficial capillary plexus, but only using the smaller 3×3 mm images from Optovue AngioVue, although patients with various chorioretinal diseases were also included. Our work builds on their foundations, including multiple models to address various image quality thresholds and validate results for images of different sizes. We also report the AUC metric of machine learning models and increase their already impressive accuracy (90%)6 for both low quality (96%) and high quality (95.7%) models6.
        This training has several limitations. First, the images were acquired with only one OCTA machine, including only images of the superficial capillary plexus at 8\(\times\)8 mm and 6\(\times\)6 mm. The reason for excluding images from deeper layers is that projection artifacts can make manual evaluation of images more difficult and possibly less consistent. Furthermore, images have only been acquired in diabetic patients, for whom OCTA is emerging as an important diagnostic and prognostic tool33,34. Although we were able to test our model on images of different sizes to ensure the results were robust, we were unable to identify suitable datasets from different centers, which limited our assessment of the generalizability of the model. Although the images were obtained from only one center, they were obtained from patients of different ethnic and racial backgrounds, which is a unique strength of our study. By including diversity in our training process, we hope that our results will be generalized in a broader sense, and that we will avoid encoding racial bias in the models we train.
        Our study shows that connection-skipping neural networks can be trained to achieve high performance in determining OCTA image quality. We provide these models as tools for further research. Because different metrics may have different image quality requirements, an individual quality control model can be developed for each metric using the structure established here.
        Future research should include images of different sizes from different depths and different OCTA machines to obtain a deep learning image quality evaluation process that can be generalized to OCTA platforms and imaging protocols. Current research is also based on supervised deep learning approaches that require human evaluation and image evaluation, which can be labor intensive and time consuming for large datasets. It remains to be seen whether unsupervised deep learning methods can adequately distinguish between low quality images and high quality images.
        As OCTA technology continues to evolve and scanning speeds increase, the incidence of image artifacts and poor quality images may decrease. Improvements in the software, such as the recently introduced projection artifact removal feature, can also alleviate these limitations. However, many problems remain as imaging of patients with poor fixation or significant media turbidity invariably results in image artifacts. As OCTA becomes more widely used in clinical trials, careful consideration is needed to establish clear guidelines for acceptable image artifact levels for image analysis. The application of deep learning methods to OCTA images holds great promise and further research is needed in this area to develop a robust approach to image quality control.
        The code used in the current research is available in the octa-qc repository, https://github.com/rahuldhodapkar/octa-qc. Datasets generated and/or analyzed during the current study are available from the respective authors upon reasonable request.
        Spaide, RF, Fujimoto, JG & Waheed, NK Image artifacts in optical coherence angiography. Retina 35, 2163–2180 (2015).
        Fenner, B.J. et al. Identification of imaging features that determine the quality and reproducibility of retinal capillary plexus density measurements in OCT angiography. BR. J. Ophthalmol. 102, 509–514 (2018).
        Lauerman, J.L. et al. Influence of eye-tracking technology on the image quality of OCT angiography in age-related macular degeneration. Grave arch. clinical. Exp. ophthalmology. 255, 1535–1542 (2017).
        Babyuch A.S. et al. OCTA capillary perfusion density measurements are used to detect and evaluate macular ischemia. ophthalmic surgery. Retinal Laser Imaging 51, S30–S36 (2020).
        He, K., Zhang, X., Ren, S., and Sun, J. Deep Residual Learning for Image Recognition. In 2016 at the IEEE Conference on Computer Vision and Pattern Recognition (2016).
        Lauerman, J.L. et al. Automated OCT angiographic image quality assessment using deep learning algorithms. Grave arch. clinical. Exp. ophthalmology. 257, 1641–1648 (2019).
        Lauermann, J. et al. The prevalence of segmentation errors and motion artifacts in OCT angiography depends on the disease of the retina. Grave arch. clinical. Exp. ophthalmology. 256, 1807–1816 (2018).
        Pask, Adam et al. Pytorch: An Imperative, High-Performance Deep Learning Library. Advanced processing of neural information. system. 32, 8026–8037 (2019).
        Deng, J. et al. ImageNet: A Large-Scale Hierarchical Image Database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. (2009).
        Krizhevsky A., Sutzkever I. and Hinton G. E. Imagenet classification using deep convolutional neural networks. Advanced processing of neural information. system. 25, 1 (2012).


Post time: May-30-2023
  • wechat
  • wechat