Evaluation of a novel deep learning-based classifier for perifissural nodules.
Han D., Heuvelmans M., Rook M., Dorrius M., van Houten L., Price NW., Pickup LC., Novotny P., Oudkerk M., Declerck J., Gleeson F., van Ooijen P., Vliegenthart R.
OBJECTIVES: To evaluate the performance of a novel convolutional neural network (CNN) for the classification of typical perifissural nodules (PFN). METHODS: Chest CT data from two centers in the UK and The Netherlands (1668 unique nodules, 1260 individuals) were collected. Pulmonary nodules were classified into subtypes, including "typical PFNs" on-site, and were reviewed by a central clinician. The dataset was divided into a training/cross-validation set of 1557 nodules (1103 individuals) and a test set of 196 nodules (158 individuals). For the test set, three radiologically trained readers classified the nodules into three nodule categories: typical PFN, atypical PFN, and non-PFN. The consensus of the three readers was used as reference to evaluate the performance of the PFN-CNN. Typical PFNs were considered as positive results, and atypical PFNs and non-PFNs were grouped as negative results. PFN-CNN performance was evaluated using the ROC curve, confusion matrix, and Cohen's kappa. RESULTS: Internal validation yielded a mean AUC of 91.9% (95% CI 90.6-92.9) with 78.7% sensitivity and 90.4% specificity. For the test set, the reader consensus rated 45/196 (23%) of nodules as typical PFN. The classifier-reader agreement (k = 0.62-0.75) was similar to the inter-reader agreement (k = 0.64-0.79). Area under the ROC curve was 95.8% (95% CI 93.3-98.4), with a sensitivity of 95.6% (95% CI 84.9-99.5), and specificity of 88.1% (95% CI 81.8-92.8). CONCLUSION: The PFN-CNN showed excellent performance in classifying typical PFNs. Its agreement with radiologically trained readers is within the range of inter-reader agreement. Thus, the CNN-based system has potential in clinical and screening settings to rule out perifissural nodules and increase reader efficiency. KEY POINTS: • Agreement between the PFN-CNN and radiologically trained readers is within the range of inter-reader agreement. • The CNN model for the classification of typical PFNs achieved an AUC of 95.8% (95% CI 93.3-98.4) with 95.6% (95% CI 84.9-99.5) sensitivity and 88.1% (95% CI 81.8-92.8) specificity compared to the consensus of three readers.