Abstract
Computer aided diagnosis (CAD) system uses an algorithm to analyze a medical image and interpret the abnormality from the image. CAD provides assistance to radiologists/doctor to assess and categorize the pathology in images. A broad range of algorithms, such as image processing and Artificial Neural Networks (ANNs) based Machine Learning (ML) and Deep Learning (DL), have been employed in this field. The systems based on ML and DL require huge amount of data for training the model. The researchers work to collect and develop medical image databases for training ML and DL based models. The datasets are released publicly to foster research and collaboration in the field of medical image processing. This paper discusses the various chest radiographic imaging datasets and challenges involved in reading it. A detailed analysis of different publicly available chest radiographic datasets for thoracic pathologies is provided. The paper also discusses the various pitfalls and challenges involved in the datasets. It is hoped that using the datasets for training deep learning algorithm will lead to advancement in the field of CAD based thoracic disease diagnosis, hence aids to improve health care system.