Abstract
Head detection-based crowd counting is of great importance and serves as a preprocessing step in many visual applications, for example, counting, tracking, and crowd dynamics understanding. Despite significant importance, limited amount of work is reported in the literature to detect human heads in high-density crowds. The problem of detecting heads in crowded scenes is challenging due to significant scale variations in the scene. In this paper, we tackle this problem by exploiting contextual constraints offer by the crowded scenes. For this purpose, we propose two networks, i.e., sparse-scale convolutional neural network (SS-CNN) and dense-scale convolutional neural network (DS-CNN). SS-CNN detects human heads with coarse information about the scales in the image. DS-CNN utilizes detection obtained from SS-CNN and generates dense scalemap by globally reasoning the coarse scales of detections obtained from SS-CNN via Markov Random Field (MRF). The dense scalemap has unique property that it captures all scale variations in image and provides an aid in generating scale-aware proposals. We evaluated our framework on three challenging state-of-the-art datasets, i.e., UCF-QNRF, WorldExpo'10, and UCF_CC_50. Experiment results show that proposed framework outperforms existing state-of-the-art methods.