Multi-observer concordance and accuracy of the British Thoracic Society scale and other visual assessment qualitative criteria for solid pulmonary nodule assessment using FDG PET-CT.
Fatania K., Brown PJ., Xie C., McDermott G., Callister MEJ., Graham R., Subesinghe M., Gleeson FV., Scarsbrook AF.
AIM: To compare the interobserver reliability and diagnostic accuracy of the British Thoracic Society (BTS) scale and other visual assessment criteria in the context of 2-[18F]-fluoro-2-deoxy-d-glucose (FDG) positron-emission tomography (PET)-computed tomography (CT) evaluation of solid pulmonary nodules (SPNs). MATERIALS AND METHODS: Fifty patients who underwent FDG PET-CT for assessment of a SPN were identified. Seven reporters with varied experience at four centres graded FDG uptake visually using the British Thoracic Society (BTS) four-point scale. Five reporters also scored SPNs according to three- and five-point visual assessment scales and using semi-quantitative assessment (maximum standardised uptake value [SUVmax]). Interobserver reliability was assessed with the intra-class correlation coefficient (ICC) and weighted Cohen's kappa (κ). Diagnostic performance was evaluated by receiver operator characteristic (ROC) analysis. RESULTS: Good interobserver reliability was demonstrated with the BTS scale (ICC=0.78, 95% confidence interval [CI]: 0.69-0.85) and five-point scale (ICC=0.78, 95 CI 0.68-0.86), whilst the three-point scale demonstrated moderate reliability (ICC=0.70, 95% CI: 0.59-0.80). Almost perfect agreement was achieved between two consultants (κ=0.85), and substantial agreement between two other consultants (κ=0.78) using the BTS scale. ROC curves for the BTS and five-point scales demonstrated equivalent accuracy (BTS area under the ROC curve [AUC]=0.768; five-point AUC=0.768). SUVmax was no more accurate compared to the BTS scale (SUVmax AUC=0.794; BTS AUC=0.768, p=0.43). CONCLUSIONS: The BTS scale can be applied reliably by reporters with varied levels of PET-CT reporting experience, across different centres and has a diagnostic performance that is not surpassed by alternative scales.