Executive Summary : | Though low-dose CT (LDCT) chest scans are widely used nowadays for its improved performance over conventional X-ray scans, radiologists are in need of more accurate model(s) that detect lung cancer from these LDCT scans in earlier stages than from scans with malignant nodules at advanced stages. Currently, there is a dearth of public chest LDCT scans both internationally and nationally, part of lung cancer screening, with such malignant nodules at earlier stages, which, if available, could potentially stop its spread to other parts of the body and thereby, increase treatment success rate. There are different Convolutional Neural Network (CNN) architectures that can be developed and validated on these LDCT scan datasets to gauge which model performs best in terms of prediction accuracy and AUC values. Though there are existing models that address this problem, they don’t focus on decreasing False Positive Rates (FPRs), thereby improve AUC value. This project is proposed with three high level objectives. Firstly, a publicly curated dataset from different data sources, i.e., collaborating Indian hospitals, for LDCT chest scan images from smokers of different age groups and localities of the country, will be created. This will constitute diverse set of scans from the individuals who smoke varying number of cigarette/beedi packs per year. For example, Mudhra cohort having chest CT scans of individuals aged more than 65 years in Mysore district, Karnataka, is one of the data sources available for curation and image preprocessing. Secondly, existing CNN based models show good accuracy on lung cancer prediction from LDCT scan images, but suffer from increase in FPRs. Different types of CNN architectures will be developed and experimented with the datasets curated in the first phase, with varying order and number of sequential convolution layers, max-pooling layers, dropouts, batch normalizations and fully connected layers. Different optimizers will be experimented to gauge the model / ensemble of models with best trained metrics and the best model will be tested on out-of-sample datasets for consistency in accuracy and AUC scores. Thirdly, an end-to-end CAD software will be developed to provide a packaged solution, ready to be used by radiologists and other medical personnel in diagnostic radiology field. This will facilitate a chest CT scan input that will be fed into best performing CNN model, validated and chosen in the second phase, and output the malignant module prediction result along with clear indication of cancerous nodules in the scan image. This will enable the radiologists to cross validate the early detection of lung cancer in sample CT scans against that prediction results from the integrated base CNN model or an ensemble of CNN models. |