Artificial Intelligence with Statistical Confidence Scores for Detection of Acute or Subacute Hemorrhage on Noncontrast CT Head Scans
Re, Thomas J.
Bodanapally, Uttam K.
Sanelli, Pina C.
Schroeppel, Thomas J.
Lui, Yvonne W.
JournalRadiology. Artificial intelligence.
PublisherRadiological Society of North America (RSNA)
MetadataShow full item record
AbstractPurpose To present a method that automatically detects, subtypes, and locates acute or subacute intracranial hemorrhage (ICH) on noncontrast CT (NCCT) head scans; generates detection confidence scores to identify high-confidence data subsets with higher accuracy; and improves radiology worklist prioritization. Such scores may enable clinicians to better use artificial intelligence (AI) tools. Materials and Methods This retrospective study included 46 057 studies from seven “internal” centers for development (training, architecture selection, hyperparameter tuning, and operating-point calibration; n = 25 946) and evaluation (n = 2947) and three "external" centers for calibration (n = 400) and evaluation (n = 16 764). Internal centers contributed developmental data, whereas external centers did not. Deep neural networks predicted the presence of ICH and subtypes (intraparenchymal, intraventricular, subarachnoid, subdural, and/or epidural hemorrhage) and segmentations per case. Two ICH confidence scores are discussed: a calibrated classifier entropy score and a Dempster-Shafer score. Evaluation was completed by using receiver operating characteristic curve analysis and report turnaround time (RTAT) modeling on the evaluation set and on confidence score–defined subsets using bootstrapping. Results The areas under the receiver operating characteristic curve for ICH were 0.97 (0.97, 0.98) and 0.95 (0.94, 0.95) on internal and external center data, respectively. On 80% of the data stratified by calibrated classifier and Dempster-Shafer scores, the system improved the Youden indexes, increasing them from 0.84 to 0.93 (calibrated classifier) and from 0.84 to 0.92 (Dempster-Shafer) for internal centers and increasing them from 0.78 to 0.88 (calibrated classifier) and from 0.78 to 0.89 (Dempster-Shafer) for external centers (P < .001). Models estimated shorter RTAT for AI-prioritized worklists with confidence measures than for AI-prioritized worklists without confidence measures, shortening RTAT by 27% (calibrated classifier) and 27% (Dempster-Shafer) for internal centers and shortening RTAT by 25% (calibrated classifier) and 27% (Dempster-Shafer) for external centers (P < .001). Conclusion AI that provided statistical confidence measures for ICH detection on NCCT scans reliably detected and subtyped hemorrhages, identified high-confidence predictions, and improved worklist prioritization in simulation.
Radiology, Nuclear Medicine and imaging
Convolutional Neural Network (CNN)
Identifier to cite or link to this itemhttp://hdl.handle.net/10713/19073